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(57) Abstract 

A consumer interactive multi-media system in which a 
CPU (102) is loosely coupled with system memory (108), and a 
graphics manipulation processor (spryte engine) performs sub- 
stantially all of the graphics rendering and manipulation func- 
tions. The spryte system accesses the memory by DMA and has 
a significantly higher bus priority than does tie CPU (102). Gra- 
phic images are stored, rendered and manipulated in a com- 
pressed format (166), both in terms of the number of bits stored 
per pixel and in terms of the number of pixels stored per frame. . 
The frame buffer information is read out from a serial port of 
the system memory and expanded to full 640 by 480 pixel for- 
mat, with a substantially full 24-bit color resolution, all within 
the video display path. The resulting images are nearly of broad- 
cast quality and can be made highly realistic. Commands to 
modify CLUT tables or other parameters in the video display 
path are provided via the display path itself, and so are automat- 
ically synchronized appropriately with pixels, scan lines, fields 
and frames. The system also includes an audio manipulation 
processor (162) which receives audio sample data via DMA 
from the system memory (108), also with a higher priority than 
the CPU (102). 
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1. 



DESCRIPTION OF RELATED ART 
field thS Invention 
The invention involves an 



interactive 



10 



15 



20 



25 



high-performance video and audio entertainment and 
education system, and more particularly, to techniques 
for reducing the cost and improving the performance and 
realism of such a system. 

2. Description of Related Art 

Interactive multi-media systems are systems in which 
the flow of an audio/visual presentation is adjusted in 
response to signals provided interactively by a user. 
In the consumer market, such systems typically run game 
programs or education programs, which are purchased by 
the consumer and loaded into the system. Consumer 
interactive multi-media systems are or have been 
available from such companies as Nintendo K.K., Sega 
Enterprises, Atari Corp., and Commodore-Amiga, among 
others . 

In the past, manufacturers of consumer interactive 
multi-media systems have had to make significant 
compromises in the realism of sounds and video images 
created by the system, in order to maintain a low price 
for the consumer market. For example, ordinary 
television sets have typically been used as the video 
output device of consumer interactive multi-media 
systems. For digitally generated images, a television 
display can support a maximum resolution of 
approximately 480 pixels vertically and 640 pixels 
horizontally. In addition, full color resolution is 
typically considered to require 8 bits of data for each 
of three primary colors (such as red, green, blue) for 
each of the pixels. Accordingly, if such systems were 
to represent a video frame in memory with full color and 
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pixel resolution, a total of 24 bits x 480 x 640 pixels, 
or nearly one megabyte of memory, would be required. 
Techniques have been developed to reduce the number of 
bits required to represent a particular pixel color with 
5 varying degrees of flexibility, but a very large amount 
of memory was still required to store the frame. 

Since memory is expensive, in order to keep the cost 
low, consumer interactive multi-media system 
manufacturers have often implemented reduced pixel 

10 resolution, reducing the number of horizontal and 
vertical pixels which are displayed on the screen from 
the 640 by 480 maximum. This results in a loss of 
detail in the displayed images, greater jaggedness in 
edge boundaries of displayed objects, and more jerky 

15 animation of images. These effects reduce the realism 
of the resulting experience. 

The realism of a multi-media experience . is also 
affected by the frequency with which an animated video 
image can be updated. For maximum realism, the system 

20 should be able to update all 480 x 640 pixels once each 
frame time (30 times per second, for NTSC television). 
Such a pixel rendering rate of over nine million pixels 
per second is far beyond the capabilities of 
conventional low cost multi-media systems, especially if 

25 any significant calculations need to be performed. 
Various techniques have been devised to reduce the 
required pixel writing bandwidth, but again, only at the 
expense of realism. For example, many systems form a 
"background" image, covering the entire frame, and a 

30 separate " sprite " image, covering only a small 
rectangular region in the frame. The pixel writing 
bandwidth is greatly reduced by altering the contents 
and position of only the sprite image at the rate of 30 
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times per second. But an unchanging background image 
reduces the realism of the visual experience. 

Various system architectures have been used in 
multi-media systems. In one form of consumer 

5 interactive multi-media system, a CPU renders and 
modifies images in a video memory which is continuously 
being read out and displayed via a separate port. Such 
an arrangement keeps the cost of the system low since 
the number of separate components is small. However, 

10 all rendering and animation in such a system is limited 
to the speed of the CPU and memory. 

In another form of system, a high-speed CPU and a 
graphics accelerator ("geometry engine") are both 
coupled to a memory in master/slave relationship. The 

15 CPU may be able to manipulate graphic images in the 
memory, but most of the manipulation is performed by the 
graphics accelerator in response to commands issued from 
the CPU to the graphics accelerator. The frame buffer 
data is continuously read out via a separate port of the 

20 memory and provided to the display. Systems such as 
these typically operate at full 24-bit color 
designations, with a full 640 by 480 pixel resolution. 
In order to update the entire frame buffer at the NTSC 
30Hz rate, such systems require powerful hardware which 

25 is extremely expensive and does not reduce the memory 
requirements mentioned above. These architectures are 
not often used in consumer systems. 

Another problem with interactive multi-media systems 
in the past has arisen because it is sometimes desirable 

30 to modify control information in the video output path 
from the memory to the display. If such a modification 
is not performed at a judiciously chosen time relative 
to the pixel clock, the position on a scan line of the 
pixel currently being drawn, and/or the scan line number 
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in a field or frame, or if it is not accomplished within 
a single pixel drawing time, then the image on the 
display may glitch undesirably. 

Accordingly, it is an object of the present 
5 invention to provide a consumer interactive multi-media 
system which ameliorates some or all of the above 
problems . 

SUMMARY OF THE INVENTION 

10 According to the invention, roughly described, a 

consumer interactive multi-media system is provided in 
which a CPU is loosely coupled with system memory, and 
a graphics manipulation processor ("spryte engine") 
performs substantially all of the graphics rendering and 

15 manipulation functions. The spryte system accesses the 
memory by DMA and has a significantly higher bus 
priority than does the CPU. To significantly reduce 
memory and processing power requirements, graphic images 
may be stored, rendered and manipulated in a compressed 

20 format, both in terms of the number of bits stored per 
pixel and in terms of the number of pixels stored per 
frame. The frame buffer information is read out from a 
separate serial port of the system memory and expanded 
to full 640 by 480 pixel format, with a substantially 

25 full 24-bit color resolution, all as part of the video 
display path. The resulting images are nearly of 
broadcast quality and can be made highly realistic. 
Commands to modify CLUT tables or other parameters in 
the video display path are provided via the display path 

30 itself, and therefore are automatically synchronized 
appropriately with pixels, scan lines, fields and 
frames. The system also includes an audio manipulation 
processor which receives audio sample data via DMA from 
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the system memory, also with a higher bus priority than 
the CPU, 

BRIEF DESCRIPTION OF THE DRAWINGS 
5 The invention will be described with respect to 

particular embodiments thereof, and reference will be 
made to the drawings, in which: 

Fig. 1 is a block diagram of major components of the 
system; 

10 Fig. 2 is a symbolic block diagram of the address 

manipulator of Fig. 1; 

Fig. 3 is a block diagram of part of the address 
generator of Fig. 2; 

Fig. 4 is a block diagram of the stack address logic 
15 of Fig. 3; 

Fig. 5 is a symbolic block diagram of part of the 
player bus interface of Fig. 2; 

Fig. 6 is a symbolic bock diagram of the audio/video 
processor of Fig. 1; and 
20 Fig. 7 is a block diagram of the left address pad 

logic of Fig. 3. 
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DETAILED DESCRIPTION 
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I. Overall Architecture 

Fig. 1 is a block diagram showing major components 
of a consumer interactive multi-media system according 
to the invention. It comprises a CP0 102, which may be 
an ARM 60 RISC processor manufactured by Advanced RISC 
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Machines, Ltd., Swaffhaw Bulbeck, Cambridge, U.K. The 
ARM 60 is described in Advanced RISC Machines, "ARM 60 
Datasheet" (1992), incorporated herein by reference. 
The ARM 60 is a relatively low-cost:, general -purpose 32- 
5 bit single-chip microprocessor. The address pins of CPU 
102 are coupled to a 32-bit address bus 104, which is 
provided as an input to an address manipulator chip 106. 
The address manipulator chip 106 contains, among other 
things, an address generator for providing DMA- 

10 generated addresses to system memory, as well as 
addresses from other sources; a D-bus arbiter; two 
spryte engines; and interfaces to a player bus, a slow 
bus and a set of external processors. The address 
manipulator chip 106 generates addresses for system 

15 memory 108, which includes a left memory bank 108A and 
a right memory bank 108B. System memory is 32-bits 
wide, the high-order 16 bits of each 32-bit word being 
in left memory 10 8A and the low-order 16 bits being in 
right memory 108B. The CPU 102 addresses system memory 

20 in words or bytes, but the address manipulator chip 106 
can address each half of the memory entirely 
independently. Address manipulator chip 106 provides 
addresses and control signals to left memory 108A over 
an LA bus 110 and an LCTL bus 112, respectively, and 

25 provides addresses and control signals to right memory 
108B over an RA bus 114 and RCTL bus 116, respectively. 

System memory 108 can include one or two "sets" of 
video RAM (VRAM) and zero, one or two sets of DRAM. A 
set of VRAM contains 512 k bytes of left memory and 512 

30 k bytes of right memory, for a total of one megabyte. 
A set of DRAM is, depending on the system configuration, 
one, four or 16 megabytes long. Other options may also 
be made available. As with VRAM, half of each set is 
located in the left bank of memory and the other half is 
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located in the right bank of memory. System memory 108 
is considered big-endian. 

Any standard DRAM will suffice. For VRAM, an array 
made up of those described in NEC Electronics, Inc., 
5 °/iPD 482234, 482235 Video RAM", Advance Copy Datasheet 
(2/92), or in NEC Corporation, tt /iPD 482445 256K-word x 
16-bit Dual Port Graphics Buffer", Preliminary Datasheet 
(Jan. 1992)/ both of which are incorporated herein by 
reference, are preferred. The former reference 

10 describes a two megabit VRAM and the later describes a 
four megabit VRAM. 

All of the left and right bank system memory sets 
receive the respective left and right half addresses 
generated by the address manipulator chip 106. All of 

15 the left bank sets also include a data port which are 
coupled bi-directionally with a left half data bus 
D(31:16) 118. Similarly, the data ports of all of the 
sets of right bank memory are coupled bi-directionally 
with a right half data bus D(15:0) 120. The VRAM sets 

20 also have a serial port S, which is coupled 
bi-directionally with an S(31:0) bus 122. 

A useful characteristic of many address-multiplexed 
DRAM and VRAM devices is "page mode" operation. The 
multiplexed nature of the address port of these memories 

25 requires that first a row (page) address be strobed into 
the device, then the column address be strobed in. When 
the row address is received by the device, it accesses 
an entire row (page) of memory and makes it available to 
the column selector. The column address is provided to 

30 the column selector. The quantity of addresses in a 
page is determined by the number of bits of address 
value provided with a row address strobe (RAS), and the 
number of bits which are selected by a column address is 
the data width of the device. For example, one of the 
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NEC VRAM devices mentioned above is organized as 256k by 
8 bits. Each row address selects one of 512 rows of 512 
addresses each, and each column address selects one of 
the 512 8-bit bytes contained in the selected row. The 
5 act of using multiple column address strobes for a 
single row address strobe is called "page mode 
operation" , and advantageously avoids the time penalty 
of having to re-establish the row address for each 
access. 

10 In VRAM devices, there is also a static holding 

register that can be loaded with a page of data from the 
internal DRAM. The contents of this holding register 
can be clocked out a serial data port of the device. In 
the above-mentioned NEC VRAM, 512 8 -bit bytes of data 

15 are loaded into the static holding register in response 
to an S-port read transfer command issued to the device. 
These bytes are shifted out the 8-bit-wide S-port of the 
device in response to a serial clock (SC), one byte per 
clock tick. The NEC device also permits the serial port 

20 to be used to write a page of data into a selected row 
of VRAM. 

The NEC device described above splits the static 
holding register into two halves (not to be confused 
with the two halves (banks) of system memory), and the 

25 device will respond to a command to load one half row 
while the other half -row is shifting out. In this 
manner, a continuous stream of data may be shifted out 
of the S-port of the VRAM without having to pause for an 
S-port read transfer cycle. The VRAM generates a QSF 

30 signal indicating which half page is currently being 
clocked out. The QSF signal is coupled over lines 124 
and 126 (for the left and right system memory banks, 
respectively) , to respective LQSF and RQSF inputs of the 
address manipulator 106. 
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Referring again to Fig, 1, address manipulator chip 
106 also provides and receives control signals to and 
from the CPU 102 over lines 128, and is also coupled bi- 
directionally with the left and right data buses 118 and 
5 120. Address manipulator chip 106 also interfaces to a 
slow bus 130, which is an 8-bit bus for accessing such 
devices as a CPU ROM 132, a battery-backed SRAM 133, 
and/or various front panel devices 134. It may also 
support additional CPU-accessible RAM, and may also 

10 support an FM sound generator device. The slow bus 130 
includes 14 bits of the address bus 104 A(16:2), an 8- 
bit data bus PD(7:0), a PDRDB read strobe, a PDWRB write 
strobe, and various control lines. PDRDB and PDWRB are 
used to carry the two low-order address lines for 

15 accessing the 8-bit wide CPU ROM 132. 

Address manipulator chip 106 also interfaces to a 
player bus 136, which is used to connect the system to 
various user input/output devices such as joysticks, 3- 
D glasses, hand controllers and steering wheels, and 

20 also to auxiliary devices such IR pods for wireless 
operation and game saver cartridges. The player bus 136 
is a serial bus. The player bus is described in the 
above -mentioned PLAYER BUS APPARATUS AND METHOD 
application. Briefly, it includes a clock line PBCK 

25 which is driven by the address manipulator chip 106, a 
data output line PDBO which is also driven by address 
manipulator chip 106, and a data input line PBDI which 
is driven by an external device. I/O controllers on the 
player bus 136 are daisy chained with the two serial 

30 data lines feeding through each of the controllers. The 
protocol automatically identifies each controller device 
type, with multiple identical controllers identified 
individually. No user settings are required. Left and 
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right audio signals are also transmitted over lines PBLA 
and PBRA on the player bus 136 to external devices. 

Address manipulator chip 106 is also coupled to a 
control bus 138, which is used to send and receive 
5 control signals to and from other processors in the 
system of Fig, 1. 

The system of Fig. 1 further includes an audio/video 
processor chip 140 which is coupled bi-directionally to 
both halves 118 and 120 of the D-bus, and coupled to 

10 receive data from the 32-bit wide S bus 122. 
Audio/video processor chip 140 is also coupled to the 
control bus 138, and is coupled to receive address bits 
A(15:2) from the system address bus 104. The 
audio/video processor chip 140 generally includes 

15 display path circuitry, an audio subsystem, timers, an 
interrupt controller, an expansion bus interface and a 
watchdog timer. The expansion bus interface couples to 
an expansion bus 142 which includes control lines 144 
and an 8-bit bus 146 carrying multiplexed address and 

20 data information. The expansion bus 142 supports such 
devices as CD/CD-ROM player 148 and optional expansion 
bus RAM 150. The CD/CD-ROM player 148 is built into the 
housing of the system of Fig. 1 and provides the primary 
mechanism by which software i6 loaded into the system 

25 for execution on the CPU 102, It may also be used to 
play standard CD/audio disks, as well as view standard 
photo-CD disks and handle other non-software formats. 

The audio/video processor 140 communicates with 
audio/video output circuitry 152 via audio lines 157, 

30 control lines 156, and a 12- or 24-bit AD bus 158. The 
audio lines 157 and the AD bus 158 are uni-directional 
from the audio/video processor 140 to the audio/video 
encoder 152. The audio/video output circuitry 152 
generally generates the video timing and output video 
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waveforms. It provides a composite video output, an RF 
output for connection to a standard television, an SVHS 
output, and separate left and right audio signal 
outputs. As previously mentioned, the audio outputs are 
5 included in the player bus 136. 

The AD lines 158 carry pixel data in RGB format at 
the rate of 24 bits per pixel time. One pixel time is 
approximately 80nS, the time required for the display 
scan to traverse one square pixel space in the NTSC 

10 standard. Thus the AD lines 158 can be multiplexed to 
actually provide half of a pixel color every 40nS 
( 25MHz ) using only 12 of the 24 AD signal lines. The 
audio output data on lines 157 is provided as a serial 
stream of digital data, at the standard CD-audio rate of 

15 one 16-bit left and right sample every 22.7 microseconds 
(44.1kHz). 

The system of Fig. 1 also includes a video input 
device 160 for "slipstream" capture of externally 
supplied video signals. The video input device 160 is 

20 coupled to the control bus 138 and provides a 32-bit 
output to the S-bus 122. The video input device 160 
also provides synchronization signals to the audio/video 
output circuitry 152. 

When externally supplied video signals are to be 

25 captured, or in some way coordinated with the remainder 
of the system of Fig. 1, the video input device 160 
derives a video clock of approximately 25MHz (twice the 
square pixel frequency) from the analog video input 
signal. The system of Fig. 1 then uses this clock 

30 signal, rather than a local crystal oscillator, to drive 
the entire video display path from system memory, 
through the audio/video processor 140, and through the 
audio/video output circuitry 152. This procedure is 
referred to as "GenLock" . When the system is GenLocked, 
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the video input device 160 decodes the video signal from 
the external analog input, digitizes it into 24-bit 
words each representing a respective pixel. This pixel 
data is processed down to a 16-bit format useful for the 
5 system of Fig. 1, and is accumulated in the video input 
device 160. During an assigned portion of the 
horizontal blanking interval, when the S-bus is not 
needed for providing output pixel data to the video 
display path in the audio/video processor 140, the 
10 system transmits a line of incoming pixel data in a 
burst oyer the S-bus 122, into VRAM. The S-bus is 32 
bits wide, so two 16-bit pixels are transferred at a 
time. 

Alternatively, or additionally, captured video data 

15 may be transmitted over the S-bus 122 directly into the 
audio/video processor 140 for writing out to the 
audio/video output circuitry 152, either in the 16-bit 
or 2 4 -bit format. 

In addition to capturing video data , video input 

20 device 160 can also be used to capture software and 
other forms of data which are being downloaded over the 
external video input path. For example, the external 
video input may be connected to a cable TV network 
cable. At the request of the user of the system of Fig. 

25 1, the cable TV head end equipment may download a game 
program over an otherwise unused cable channel* Video 
input device 160 would capture the software and burst it 
into system memory 108 in the same manner as captured 
video pixels are transmitted to system memory 108, 

30 except that this data can be transferred in an 
uncompressed 32-bit wide format. Instead of displaying 
the data, the system of Fig. 1 would execute it on the 
CPU 102. Alternatively, "downloaded software may be 
received by a device on the expansion bus 142 and 
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accessed by the system of Fig. 1 in a manner which is 
similar to the accessing of CD/CD-ROM player 148. 

The system of Fig. 1 also includes an audio input 
device 162 to capture externally supplied audio data, to 
5 compliment the externally supplied video data. The 
4-wire output of audio/video input device 162 is coupled 
over lines 164 to the audio subsystem in the audio/video 
processor 140. The audio/video processor is 

programmable by the CPU 102 to accept the data in any of 

10 several predefined serial formats. Audio input device 
162 is also coupled to the control bus 138. 

The system of Fig. 1 also includes a decompression 
co-processor 166 which is coupled to the control bus 
138, to bits A(15:2) of the system address bus 104, and 

15 to both halves 118 and 120 of the D bus. Decompression 
co-processor 166 is used to decompress software or image 
data in system memory 108. Typically, such data was 
previously loaded into the system from the CD/CD-ROM 
player 148 or from another external source. 

20 The system of Fig. 1 does not require a user to 

perform any system configuration process, although it 
does permit a user to set certain preferences to 
optimize the audio/video experience in a particular 
environment. 

25 

ii. Memory QEganizflttoh 

A section of system memory starting at address 
zero and extending to either 0, 8, 16 or 32 K bytes, may 
be defined as SYS RAM. If SYSRAM is available in a given 
30 embodiment, the size selection is made by the software. 
The address manipulator chip 106 contains protections 
which prevent user software from reading or writing to 
SYSRAM. 
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All of the system address and timing signals are 
generated by the address manipulator chip 106. Any 
requests for access to system memory from either the CPU 
102 or the audio/video processor 140 pass through the 
5 address manipulator chip 106. The address manipulator 
chip 106 detects page breaks when they occur in random 
memory accesses and anticipates page breaks in 
sequential memory accesses. 

The split read transfer function of the NEC 

10 VRAM, as well as the split write transfer function, are 
special functions of these chips which are controlled by 
address manipulator 106. Other special VRAM functions 
which are supported in the system of Fig. 1 are a write 
to the color register, a flash write, and a CAS before 

15 RAS feature reset. Special VRAM functions are activated 
by performing an access to a memory-mapped address 
corresponding to the desired function, with the desired 
VRAM page address located in address bits 9:2. The data 
bits which are provided over the D-bus when invoking a 

20 write transfer mode are used for masking purposes. A 
"1" in a particular bit will allow the write to occur 
for that bit position and a zero will prevent the write 
from occurring for that bit position. Special VRAM 
functions may be initiated by the CPU 102 or by the 

25 address generator internal to the address manipulator 
chip 106. 

Although the split register feature of the NEC VRAMs 
permits seamless serial output across page boundaries, 
it does not permit seamless serial output across the 
30 boundary from one VRAM device to another. Accordingly, 
most data structures in system memory which are to be 
shifted out of the S-port down the display path, are not 
permitted to cross the one megabyte physical boundary. 
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This limitation can be removed or relaxed if deeper 
VRAMS are used. 

Except for the allocation of SYSRAM, and except for 
the physical VRAM depth limitation on data structures 
5 which are expected to be shifted out the S-port of a 
video RAM, the restrictions on where various portions of 
a software application may be located in system memory 
are minimal- In one embodiment, in a minimum system, 
with only one megabyte of system memory (VRAM) , the low 

10 64k 32-bit words might contain CPU instructions and 
data. The next 300k bytes might contain compressed 
image source data, and the next 172k bytes might contain 
audio and other data. The next 150k bytes might be 
allocated for one frame buffer (320 by 240 pixels by two 

15 bytes per pixel), and the last 150k bytes might be 
allocated for a second frame buffer. 

Frame buffers are arranged so that even numbered 
data lines reside in the left memory bank and odd 
numbered data lines reside in the right memory bank. 

20 Pixels are represented as 16-bit values divided as 
follows: five bits to represent a red pen number, five 
bits to represent a green pen number, four bits to 
represent a blue pen number, and two subposition bits H 
and V. In an alternative data format, either the H bit 

25 or the V bit may be replaced by a fifth blue pen number 
bit. When a pixel value is transmitted down the display 
path, a color look-up table translates each 4- or 5-bit 
pen number to an 8-bit value for the corresponding color 
DAC. The color look-up table can be updated prior to 

.30 each scan line. Pixels are stored at a low resolution 
of 320 by 240 pixels by frame, and the H and V 
subposition bits indicate which quadrant of the low- 
resolution pixel area the designated color is actually 
considered to be located in. 



WO 94/10641 W W PCT/US92/09349 



- 19 - 

Two vertically adjacent pixels are always fetched 
from the frame buffer over the S-bus to generate high- 
resolution pixels for display. The two pixels would be 
vertically adjacent if viewed directly on a display, but 
5 are horizontally adjacent as stored in the VRAM. That 
is, a single address provided to both banks of VRAM is 
sufficient to fetch both vertically adjacent pixels. 

The two data lines which are fetched from the frame 
buffer contain the information required to calculate the 

10 pixel color value for the two high- resolution (640 by 
480) scan lines that they surround (the lower high- 
resolution scan line of the upper low-resolution frame 
buffer data line, and the upper high-resolution scan 
line of the lower low-resolution frame buffer data 

15 line). Two horizontally adjacent sets of upper and 
lower frame buffer data are maintained in the video 
display path and an interpolation is performed to 
determine the actual pixel color for each desired 
high-resolution pixel to be displayed. 

20 Alternatively, frame buffer data can be stored with 

full 480-line resolution. There are two forms of 480- 
line frame buffers. One form actually comprises two 
separate 240-line buffers, each representing the data- 
for a particular field. In this case, the display 

25 operation is handled by disabling vertical interpolation 
and pointing the "current line video address" at the 
appropriate buffer when the image in that buffer is to 
be displayed. Note that the entire screen does not need 
to be in 480 mode. The second form has 480 lines 

30 sequentially in memory. The hardware is placed in a 480 
mode and vertical interpolation is disabled. In this 
case, the video display path automatically retrieves 
alternate scan lines of frame buffer data in order to 
accommodate the interlaced display. 
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III. Control Bus 

Several functions of the system of Fig. 1 are split 
between two chips and coordinated via the control bus 
5 138. These functions include reads and writes from the 
CPU 102 to or from the audio/video processor 140/ DMA 
operations requested by the audio/video processor 140 
and performed by the address manipulator chip 106, and 
video timing. The signals on the control bus 138 

10 include PCSC for video sync timing, DMAREQ for DMA 
requests from the audio/video processor 140 to the 
address manipulator chip 106; CCODE(2:0) for events 
involving the audio/video processor 140; and PLSC and 
PRSC for slipstream timing of captured data to be 

15 transmitted down the video display path together with 
frame buffer pixel data. 

a. Video and Slipstream Timing 

The PCSC line of control bus 138 is a serial 
20 communication line from audio/video processor 140 to the 
address manipulator chip 106 to indicate various video 
conditions. Each code is approximately 8 bits in 
length. The address manipulator chip 106 starts a 
horizontal counter as a result of receiving a PCSC 
25 burst, and video operations internal to the address 
manipulator chip 106 are then timed in response to the 
horizontal counter. In addition, the address 

manipulator chip 106 generates the PLSC and PRSC clocks 
as well as respectively related LSC and RSC clocks. The 
30 LSC and RSC clocks are provided to the VRAM for clocking 
the serial ports, and PLSC and PRSC are used by the 
audio/video processor 140 and any other device on the S- 
bus to correctly identify the time slots on that bus. 
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B. DMA Ope rations Requested fex AttflAQ/VlfleP 

Processor 140 

As described in more detail below, the address 
5 manipulator chip 106 performs DMA operations requested 
by various devices both within and outside of that chip. 
The DMA operations are performed in response to DMA 
requests issued by the various devices, and a priority 
scheme is used in the event that more that one device 

10 requests a transfer at the same time. These transfers 
all occur over the D-bus. 

Audio/video processor 140 contains its own 
prioritization mechanism for DMA requests made by 
devices within the audio/video processor chip 140, and 

15 issues a single request to the address manipulator 106 
whenever any of its devices desires a DMA transfer. 
Thus, all devices within the audio/video processor 140 
together occupy a single priority level in the DMA 
prioritization scheme in the address manipulator 106. 

20 When a device within the audio/video processor 

140 desires a DMA transfer, if there is no other request 
from audio/video processor 140 then pending, the DMA 
requestor within audio/video processor 140 sends the 
desired DMA channel number by serial stream over a 

25 DMAREQ line of control bus 138 to the address 
manipulator chip 106. The audio/video processor 140 
saves the channel number internally as well. DMA 
channel numbers as viewed from within audio/video 
processor 140 are different from those required for the 

30 address generator in address manipulator 106, so a 
translation is performed in the address manipulator chip 
106 before the request is made to the D-bus arbiter. 

When the arbiter grants the D-bus (and the address 
generator in the address manipulator chip 106) to the 

35 requested transfer, the address generator performs the 
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DMA over the D-bus and transmits a CCODE to the 
audio/video processor 140 over the control bus 138, to 
indicate that the requested transfer is taking place. 
A CCODE is transmitted for each word transferred. For 
5 transfers £a audio/video processor 140, the audio/video 
processor 140 recognizes the CCODE, captures the data on 
the D-bus, and transmits it to the appropriate 
requesting unit within the audio/video processor 140. 
For transfers from audio/video processor 140, the 

10 processor 140 enables data from the appropriate 
requesting unit onto the D-bus, in response to each 
CCODE. The address manipulator chip 106 also sends bits 
to the audio/video processor 140 to indicate the end of 
a DMA block, the end of a DMA length, and the status of 

15 looping. After a DMA transfer is fully serviced and the 
audio/video processor 140 has completed its internal 
processing, it can send its next request to the address 
manipulator 106 if appropriate. 

20 C. CPU Accesses to Au dio/Video Processor 140 

When the CPU 102 starts a cycle intended for the 
audio/video processor 140 (or a device on the expansion 
bus 142), the address manipulator 106 issues a request 
on the CCODE lines of control bus 138. The audio/video 

25 processor 140 strobes a CREADY line on control bus 138 
to indicate when the CPU cycle can terminate. Address 
manipulator 106 responds to CREADY by returning READY to 
the CPU 102. Note that control of the display path is 
handled in the CLUT list transfer process described 

30 below. The display path is not directly accessed by the 
CPU 102. 
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d. SCOPE Values 

The CCODE value can indicate the following 
conditions : 

1. NO OP 

5 2. CPU write to Audio/Video Processor 140 

3. DMA operation (Audio/Video Processor 140 knows 
the channel number and whether it is a read or write) 

4. CPU read from Audio/Video Processor 140 
5 . Trace write 

10 6. Trace read 

7. Information code to follow. 

When an information code is sent, it can indicate 
any of the following; 

1 . ABORT pre-warning 
15 2. End of Player bus DMA reached 

3 . End of DMA block service 

4. End of DMA block service AND end of length (if 
the length expires and if DMA looping is allowed this 
code is not sent) 

20 5. Certain violations of SYSRAM (if SYSRAM exists) 

or VRAM or DRAM size 

6. End of DMA block service AND end of length AND 
no DMA looping 

7. DMA request cannot be serviced 

25 

IV. Address Manipulator Chin 106 

Fig. 2 is a symbolic block diagram showing major 
functional units of the address manipulator chip 106 of 
Fig. 1. It comprises an internal 32-bit MDT data bus 
30 202, an internal 22-bit MADR address bus 204. The MDT 
data bus 202 is coupled to the left and right half 
system D-bus 118, 120 via buffers 222. The chip 106 
also includes a CPU interface unit 206 which is coupled 
to receive CPU-generated addresses over the A-bus 104, 
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and also communicates with the CPU 102 over control 
lines 128. Among the control lines 128 is an MCLK 
signal provided by the CPU interface 106 to the MCLK 
input of CPU 102, which is the memory clock input of CPU 
5 102. Address manipulator 106 controls the waveform of 
this clock signal to both stretch CPU cycles for slow 
accesses and to put the CPU 102 to sleep for long 
periods of time. The ARM 60 CPU is a static part which 
does not need maintain any minimum clock input 

10 frequency. 

D-bus arbiter 210 arbitrates requests for control of 
the D-bus. Conceptually, it grants control of both the 
D-bus and address generator 208 as a single resource. 
When the D-bus arbiter 210 grants control of the D-bus 

15 (and address generator 208) to the CPU 102, addresses 
generated by the CPU 102 are passed by the CPU interface 
206 to address generator 208. The address generator 208 
drives the high-order address bits from A(31:16) onto 
the MADR bus 204, where they are decoded by an address 

20 decoder 212. Address decoder 212 determines from these 
bits whether the desired address represents a memory- 
mapped hardware register, in which case it activates the 
appropriate select line to notify the appropriate 
hardware component in the system of Fig. 1. That 

25 hardware component then performs the desired function in 
response to bits A(15:2) of system address bus 104. If 
address decoder 212 determines that the desired address 
is part of system memory 108, then it so notifies the 
address generator 208. Address generator 208 generates 

30 the appropriate addressing and control signals on the 
LCTL and LA buses 112 and 110, and the RCTL and RA buses 
116 and 114. The address generator 208 is also coupled 
to receive LQSF and RQSF signals over respective lines 
124 and 126 to aid in its control of S-port transfers. 
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Address generator 208 receives addresses from the 
CPU via the CPU interface 206 and also from spryte 
engine 214. Address generator 2C8 also maintains a 
stack of DMA control information and can generate 
5 addresses for DMA transfers. The D-bus arbiter 210 
receives requests from the various devices for transfers 
over the D-bus, arbitrates among them, and indicates to 
address generator 208 which request to service. Even 
though the two halves of system memory are addressed and 

10 controlled separately, only one master may be 
operational at a time. If the winning requestor has 
requested a DMA transfer, then the D-bus arbiter 210 
supplies the address generator 208 with a DMA group 
address indicating where in the DMA stack the desired . 

15 control information may be found for the requested 
transfer. In effect, the DMA group address identifies 
a particular DMA channel. The DMA interface is handled 
entirely within the address manipulator chip 106. 

The spryte engine 214 is coupled bi-directionally 

20 with the internal MDT data bus 202 and with low-order 
bits of the internal MADR address bus 204. The 
functions and operation of the spryte engine 214 are 
described in more detail below. 

Address manipulator chip 106 also includes a player 

25 bus interface 216 and a slow bus interface 218, for 
interfacing respectively to the player bus 136 and the 
slow bus 130. Player bus interface 216 and slow bus 
interface 218 are each connected bi-directionally with 
the internal MDT data bus 202 and are activated by 

30 respective select lines generated by address decoder 
212. Address manipulator chip 106 also includes an 
external processor interface 220 which couples the chip 
with the external control bus 138. 
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a. D-Bua Arbiter 21Q 

D-bus arbiter 210 receives requests from various 
requestors for access to the D-bus, the D-port of system 
memory 108, and address generator 208. When D-bus 
5 arbiter 210 grants these resources to a particular 
requestor, it typically sends an acknowledge signal to 
the requestor. Some requestors of DMA transfers do not 
require an acknowledge, but rather simply assume that 
the requested transfer will take place. For requestors 

10 in audio/video processor 140, the acknowledge takes the 
form of a CCODE issued over the system control bus 138 
(Fig. 1) at the time of transfer, as previously 
described. DMA transfers occur in bursts, each burst 
having a maximum burst length which depends on the 

15 requestor. Requestors maintain their requests to 
arbiter 210 until the last burst of transfers required 
to complete the transfer, begins. For some requestors, 
D-bus arbiter 210 can determine in advance the number of 
D-bus cycles to allot to the requestor, in which case 

20 the arbiter 210 then waits the required number of cycles 
and then gives the bus to the next winning requestor. 
For other requestors, the arbiter 210 cannot determine 
in advance the number of cycles to allot. These 
requestors return a "ready" to the arbiter 210 when they 

25 are finished with the bus. In either case, if the 
arbiter 210 receives a higher priority request than the 
one that is currently being serviced, the arbiter 210 
grants the D-bus to the higher priority requestor after 
the current burst and postpones the current requestor. 

30 The transfer requested by the postponed requestor 
resumes after all higher priority requests have been 
serviced, since the postponed requestor will then be the 
highest priority requestor remaining. 
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The order of priority among requestors is as 
follows: 

1. Spryte engine transfers (read or write addresses 
provided to address generator 208 by spryte engine 214) , 

5 after the spryte engine has already acquired the bus; 

2. Mid-line video transfer requests; 

3. Slipstream capture requests , CLOT transfer 
requests, GLUT mid-line requests and start video 
requests; 

10 4. Refresh requests; 

5. Player bus requests; 

6. All requests from audio/video processor 140; 

7. Spryte engine requests for FIFO input data 
(addresses generated by address generator 208); 

15 8. Spryte engine transfers, before the spryte 

engine has acquired the D-bus; and 
9. CPU 102. 

Note that spryte engine transfers have two positions 
on the priority list. If the spryte engine 214 is 

20 running, it requests the bus regardless of its actual 
need for the bus. At this request time, spryte engine 
output transfers have the lowest priority other than the 
CPU. Once the spryte engine obtains the bus for output 
transfers, it keeps the bus regardless of its actual 

25 need for the bus. When any other requestor requests the 
bus, or if an interrupt occurs, D-bus arbiter 210 
signals the spryte engine 214 to relinquish the bus, 
which it must do within seven clock ticks. When spryte 
engine 214 does relinquish the bus, it automatically re- 

30 asserts its request. But since spryte engine 214 is now 
low on the priority list, all pending requestors and 
interrupts are serviced before the spryte engine 
transfers will again take place. 
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A start video request is issued by an S-port control 
unit 224 at the beginning of every scan line for which 
a video transfer was enabled. Since a scan line of low 
(or high) resolution pixels has a different length than 
5 the number of pixels in a VRAM page, it is frequent that 
a new page will need to be loaded into the static 
holding register of the VRAM in the middle of a scan 
line. More particularly, since the NEC VRAMs support 
split read transfers, the S-port control unit 224 issues 

10 a mid-line video transfer request in response to each 
edge transition in the LQSF or RQSF signals from the 
VRAM. Making the request to perform a read transfer on 
one half of a VRAM while the other half is shifting its 
data out the S-port, and giving such mid-line requests 

15 a high priority, helps prevent the video display from 
glitching. 

Slipstream capture requests are also given a high 
priority to ensure that captured data is not missed. 
Similarly, CLUT list transfer requests and CLUT mid- 
20 line requests are given a high priority to ensure their 
timely servicing. Slipstream capture requests, CLUT 
list transfer requests, CLUT mid-line transfer requests 
and start video requests are all extremely 
well-structured. They are all given the same 
25 arbitration priority since non-overlap of these requests 
is assured by other hardware. 

Refresh requests are performed in bursts of four. 
All four requests are issued at the same time and if no 
higher priority requestor takes the bus, all four will 
30 be processed without intervening page breaks. 

The CPU 102 is intentionally given the lowest 
priority in the arbitration for access to the D-bus port 
of system memory 108 because in the architecture of Fig. 
1, the CPU 102 is conceived to perform housekeeping 
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functions only. All the other functional units in the 
system are more tightly coupled with the memory than the 
CPU is, so they can perform their functions at high 
speed. In the past, the requirement that the CPU 
5 perform many of the detailed functions of an interactive 
multi-media . system either limited the performance and 
realism of the system, or mandated the use of a powerful 
and expensive CPU, or both. 

10 B. Spryte Manipulation Subsystem 

The spryte engine 214 (Fig. 2) is described in 
detail in the related SPRYTE RENDERING SYSTEM WITH 
IMPROVED CORNER CALCULATING ENGINE AND IMPROVED POLYGON - 
PAINT ENGINE and IMPROVED METHOD AND APPARATUS FOR 

15 PROCESSING IMAGE DATA applications, incorporated by 
reference herein. 

c. piaysK Bus Interface 

Fig. 5 is a symbolic block diagram of player bus 
20 interface 216. It comprises a 32-bit shift register 502 
controlled by a player bus control circuit 504. The 
parallel inputs and outputs of the shift register 502 
are both coupled to the address manipulator chip 106 
internal MDT data bus 202. The serial input line of 
25 shift register 502 is coupled to receive the PBDI signal 
from the player bus 136, and the serial output of shift 
register 502 drives the PBDO line of player bus 136. 

The player bus 136 uses a single stream serial 
protocol that is run at a rate of once per display 
30 field. The parallel data transfer in both directions 
between the system memory 108 and the shift register 502 
is handled by the DMA system. 

During vertical blanking scan lines 1-8, the player 
bus clock line PBCK (Fig. 1) is held low, and during 



WO 94/10641 W W PCT/US92/09349 



- 30 - 

scan lines 9-16, PBCK is held high. The long clock is 
used to synchronize all player devices to vertical 
blank. On scan line 17, if the DMA channel is enabled, 
the DMA system will first read the contents of the shift 
5 register 502 into a system memory address specified for 
input data,, and will then transfer a word from the 
system memory address specified for output data, into 
the shift register 502. The player bus control 
circuitry 504 then simultaneously shifts out the 32 bits 

10 to the player bus 136 and shifts in 32 new bits from the 
player bus 136. The shift rate is two bits out (and two 
bits in) per scan line, and is derived from the video 
clock provided by the audio/video processor 140. After 
completion of a 32-bit shift, the player bus control 

15 circuit 504 informs the DMA system that a new word pair 
is ready for transfer to and from system memory. When 
the DMA system indicates that all words have been 
transferred to/from shift register 502, the player bus 
control circuitry 504 then shifts out the last 32 bits 

20 of output data onto the player bus 136 and an interrupt 
is generated. 

On the player bus 136, the address manipulator chip 
106 drives the PBCK signal with enough strength to drive 
a long cable of daisy-chained player bus devices without 

25 buffering. This is to give devices down the line a 
valid timing relationship to the horizontal and vertical 
sync. Within each player bus device, the PBDO and PBDI 
signals are electrically regenerated and resynchronized 
to the local clock in each device. 

30 

D. Slow Bus Interface 

The slow bus 130 is an 8-bit interface that is 
used for communicating with ROM, battery-backed RAM, 
possible other RAM, a possible front panel device, 
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and/or any other non-handshaked 8-bit slow peripheral. 
The 8-bit data bus is bi-directional. Output data is 
merely the data present on bits 7:0 of D-bus 118, 120. 
For read cycles from other than the ROM 132 , the CPU 102 
5 considers only D(7:0) significant. When the slow bus is 
performing a ROM access, the interface 218 actually 
performs four read cycles over the slow bus, captures 
the four bytes of data from the ROM, and assembles them 
into a 32-bit word for the CPU 102. The two port 

10 strobes PDRDB and PDWRB are used as the two low-order 
address lines to the ROM. The remaining address lines 
come directly from the CPU over A(16:2). The control 
signals on the slow bus 130 include a ROM chip select, 
a front panel device chip select, perhaps other chip 

15 selects, and PDRDB and PDWRB as read and write strobes. 
Further subdivision of the front panel chip select 
strobe can be accomplished externally. The CPU address 
lines A(16:2) are used where the devices require address 
lines. 

20 The slow bus devices, including the CPU ROM 132, 

are accessible by the CPU 102 in a plurality of memory- 
mapped address ranges. For example, A(31:17) may 
indicate a CPU ROM 132 access, and bits A(16:2) may 
indicate the address within CPU ROM 132 to be accessed. 

25 Slow bus devices are not accessed by the DMA engine. 

v. Audio /Vjfleo Processor 

Fig. 6 is a symbolic block diagram depicting major 
functional blocks of the audio/video processor 140 
30 (Fig. 1). It comprises an I/O interface 602, which is 
coupled to receive address lines A(15:2) from system 
address bus 104. It is also coupled to the control 
lines 138 and bi-directionally coupled to the D(31:0) 
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data bus 118, 120. The I/O interface 602 is also 
coupled with an internal 16-bit CDT data bus 604. 

Audio/video processor 140 also includes the audio 
subsystem 606 , which generates a serial digital audio 
5 signal over audio output lines 157. The chip also 
includes a set of timers 608, an interrupt controller 
610, and a watchdog timer 612. It also includes an 
expansion bus interface 614 which is coupled to 
expansion bus 142. It also includes a local DMA arbiter 

10 and common requestor 616, which receives DMA requests 
from the various devices within audio/video processor 
140 and asserts a DMAREQ signal to the D-bus arbiter 210 
(Fig. 2) in address manipulator chip 106 when 
appropriate. Audio/video processor 140 also includes 

15 the video display path 618. This unit receives 32-bit 
words over the S-bus 122, performs CLDT translations and 
horizontal and/or vertical interpolation, as required, 
and provides the results to the audio/video output 
circuitry 152 over AD bus 158. Video display path 618 

20 also communicates with audio/video output circuitry 152 
over control lines 156. 

A. Video Display Path 

The video display path is described in detail in 
25 the related RESOLUTION ENHANCEMENT FOR VIDEO DISPLAY 
USING MULTI-LINE INTERPOLATION and METHOD AND APPARATUS 
FOR UPDATING A GLUT DURING HORIZONTAL BLANKING 
applications, incorporated by reference herein. 

30 B. Audio Subsystem 

The audio subsystem includes a high-performance 
digital signal processor (DSP) which is inherently 
cyclical in nature. A timer is provided which can be 
programmed to reset the DSP and return to the first 
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instruction periodically, typically once each sample of 
an audio input stream. The number of clock ticks per 
cycle can be governed by CPU 102 software, based on an 
internally generated tick frequency, or can be governed 
5 by externally provided reset signals. Pipeline 
operation in the DSP is enhanced through the use of a 
double-buffering system in which operands are latched 
into the first stage of a double buffer as soon as they 
are ready, but are transferred to the second stage only 

10 when the last-ready operand is available and the 
computation unit in the DSP is ready to receive the 
operands. The computation unit receives the operands 
from the second stage of the buffers. 

The audio subsystem has a program memory which . 

15 stores a DSP program provided by the CPU 102 via memory- 
mapped addresses. It also includes a random access data 
memory, a plurality of data input FIFOs, and a plurality 
of data output FIFOs. Each FIFO is associated with the 
respective location in the random access data memory. 

20 When the DSP reads a value from one of the locations in 
the random access data memory corresponding to an input 
FIFO, control means automatically refills that location 
from the corresponding FIFO. Similarly, whenever the 
DSP writes data to one of the locations corresponding to 

25 an output FIFO, control means automatically recognizes 
that and copies the data into the corresponding output 
FIFO. Input FIFOs are filled, and output FIFOs are 
emptied, by DMA under the control of address generator 
208 (Fig. 2) in address manipulator chip 106. A 

30 separate DMA channel is allocated to each FIFO. DMA 
requests to fill or empty a FIFO are made through the 
DMA requestor 616. 

Audio samples in the audio subsystem FIFOs are each 
16 bits long, and each of the input FIFOs are 16 bytes 
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(eight samples) deep. At its highest quality, all 16 
bits of a sample are used as the sample value for a 
sound. Alternatively, the audio subsystem can use only 
eight bits per sound sample. Other options are al60 
5 possible. When the audio subsystem pulls the last 
sample from, an input FIFO, and no more samples are 
forthcoming from a DMA transfer, the audio subsystem 
leaves the last sample in place in the FIFO in order to 
pull it again. 

10 The four output FIFOs of the audio subsystem may be 

used, for example, to create reverberation effects. 
When the audio subsystem has completed outputting data 
to an output FIFO, it can send a FLUSH command to that 
FIFO. This will set a bit to ensure that sufficient DMA 

15 requests are made to the arbiter 210 until the FIFO is 
empty . 

Instructions running in the DSP in the audio 
subsystem 606 may include a M write-back" bit, which 
causes the result of an operation automatically to be 

20 written back to a corresponding one of the operands. 
The DSP also supports an instruction which, if placed 
following a branch instruction in DSP program memory, 
may mandate another branch from the target instruction 
stream of the first branch after a predetermined number 

25 of instructions from that target stream have been 
executed. The DSP is also capable of moving data in 
response to one instruction, without affecting the 
progress of a computation which is taking place 
simultaneously in response to another instruction. The 

30 DSP also has an operand mask register which permits 
doing many similar operations with a single re-used 
constant . 

The DSP in the audio subsystem 606 operates at 
approximately 25MHz and can perform approximately 568 
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system clock ticks worth of processing on each sample of 
a typical audio signal. Such audio signals are received 
via the input FIFOs, by DMA transfers from system memory 
108. Possible original sources of this data can be, for 
5 example, samples taken from a CD-audio disk player, at 
the standard frequency of 44.1 or 88.2 kHz. 

Audio output from the subsystem 606 is provided in 
an IIS standard serial stream with 16 bits of data per 
channel and at a rate determined by an external digital 
10 filter clock. The audio subsystem 606 writes its left 
and right values (16 bits each) into an output stack 
that is a total of 64 bits deep. This allows for the 
outputting of up to two pairs of audio samples with no 
overflow. 

15 Audio subsystem 606 manipulates audio signals which 

have been digitized and stored as part of the CPU 
software. It can also generate its own audio signals, 
and can manipulate audio signals provided by an external 
source such as a CD-audio player or an FM synthesizer 

20 chip. It can also handle other digital audio sources 
which are coupled to the expansion bus 142. The audio 
subsystem 606 also can perform incoming CD-originated 
software and data decompression. 

25 C. Expansion Bus Interface 

The expansion bus 142 consists of eight bi- 
directional data lines, three control lines, one strobe, 
one ready line and one interrupt line. The three 
control lines define the type of cycle the bus is 

30 performing. No bus events occur due to the state or 
edge of any control line, and only the activation of the 
strobe will cause a bus event. 

Devices on the expansion bus 142 can assert an 
interrupt to the expansion bus interface 614 to receive 
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attention. The CPU 102 addresses devices on the 
expansion bus by reading or writing to predefined 
memory-mapped addresses. In particular , the CPU 102 
first writes an expansion bus address as data to a 
5 predefined memory mapped address, then reads or writes 
desired data from or to a second predefined memory- 
mapped address. 

VI. Address Generator 208 

10 Fig. 3 is a block diagram of parts of address 

generator 208 (Fig. 2). It comprises a multiplexer 302 
which has two 10-bit input ports and a 10-bit output 
port. The first of the input ports of multiplexer 302 
is coupled to receive MDT(25:16), and the second port 

15 is coupled to receive MDT( 31:24) in the low-order 8 bits 
and '00' in the high-order 2 bits. The output of 
multiplexer 302 is coupled to the input of a 10-bit wide 
bypassable storage element 304. The bypassable storage 
element 304 contains a register which may be bypassed in 

20 response to a select input signal 306. 12 high-order 
'0' bits are added to the 10-bit output of bypassable 
storage element 304 and provided to one input port of a 
22-bit wide multiplexer 308, the other input port of 
which is coupled to receive MDT(23:2). The 22-bit 

25 output of multiplexer 308 is provided to one input port 
of a 3 -input multiplexer 310, the second input port of 
which receives a word of all zeros. The select input of 
multiplexer 302 is coupled to receive a BPP signal, the 
select input of multiplexer 308 is coupled to receive a 

30 LOAD OFFSET signal, and the 2-bit select input of 
multiplexer 310 is coupled to receive an encoded signal 
indicating CTL, DMAOWN, or ZERO. 

The output of multiplexer 310 is connected to the D 
input of a 2 2 -bit wide DMA stack, containing 128 
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register locations. The D output of DMA stack 312 is 
coupled to one input port of a source multiplexer 314. 
The other input port of source multiplexer 314 is 
coupled to the output port of a 3-input multiplexer 316. 
5 One input port of multiplexer 316 receives the 22-bit 
spryte engine read or write address from spryte engines 
214 (Fig. 2), and a second input port of multiplexer 316 
receives bits A(23:2) of the CPU address. 

The output port of source multiplexer 314 is coupled 

10 over a source mux out bus 370 to the D input of a 22- 
bit wide register 318, the output of which is coupled to 
one input port of an adder/clipper 320. The 
adder/clipper 320 can, in response to control signals 
received over control lines 322: pass the first input 

15 port value unchanged; pass it incremented or decremented 
by one; pass it incremented by 320, 384, 512, 1024, 160 
or 192 (the number of words required to move the low- 
resolution pixel scanning window down by one. low- 
resolution scan line for various available buffer 

20 widths) ("add modulo"); add 1 to the upper portion of a 
number and clear the lower portion ("add RAS and clip 
CAS"); or add the first input port value to the second 
input port value. 

The output of adder/clipper 320 is coupled to the D 

25 input of a 22-bit wide register 324, the output port of 
which is coupled back to the third input port of 
multiplexer 316. The output of adder/clipper 320 is 
also coupled back to the third input port of multiplexer 
310. 

30 The output of the multiplexer 308, in addition to 

being provided to an input port of multiplexer 310, is 
also coupled to the input of a register 328, the output 
of which is coupled to the second input port of 
adder/clipper 320. The output of register 318 is also 
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couplable to the MDT data bus 202 at bits MDT(24:2) via 
three-state buffers 330. The output of register 318 is 
also coupled to one input port of another multiplexer 
332 , the other input port of which is coupled to receive 
5 the output of adder/clipper 320. The output of 
multiplexer .332 is coupled to the input of a register 
334, the output of which drives bits MADR(21:0) of 
internal address bus 204 of the address manipulator chip 
106. 

10 The low-order seven bits of the source mux out bus 

370 are also coupled to a stack address logic unit 336. 
Stack address logic unit 336 also receives the 7-bit DMA 
group address and the control lines from D-bus arbiter 
210 (Fig. 2). Stack address logic 336 generates a 7- 

15 bit write address and a 7-bit read address for the DMA 
stack 312. 

The 22-bit source mux out bus 370 is also coupled 
to an address verify unit 338 and to a "page break 
imminent" (PBI) detector 340. The address verify unit 

20 338 issues an abort signal to CPU 102 if there is no 
physical memory at the specified address, if the 
specified address is illegal for the current user, or if 
the address was issued during a DMA transfer of graphics 
or audio and the address is within SYSRAM space (if 

25 SYSRAM exists). Software running on CPU 102 can write 
registers in the address verify unit 338 to indicate the 
locations of physical memory, legal addresses for the 
present user, and the size of SYSRAM. 

PBI detector 340 outputs a signal, which is stored 

30 in a D flip-flop 342, when the specified address is part 
of a sequential series of accesses and points to the 
last word in a page of system memory 108. If it is, the 
immediately subsequent memory access cycle is held while 
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the new row address is provided to system memory 108 and 
an appropriate RAS signal strobed. 

The source mux out bus 370 is also provided both 
directly and via a register 344, to respective input 
5 ports of left address pad logic 345, which selects 
either the high- or low-order 11 bits for the 11 -bit LA 
output 110 of the address manipulator 106. The high- 
order 11 bits indicate a left bank system memory row 
address, and the low-order 11 bits indicate a column 

10 address for the left bank of system memory. Similarly, 
the source mux out bus 370 is also provided, both 
directly and via a register 352, to respective input 
ports of right address pad logic 353, the output of 
which drives the 11-bit RA output bus 114 of the address 

15 manipulator chip 106. 

The high-order 13 bits of the output of register 
344 are coupled to one input port of a multiplexer 360, 
the other input port of which is connected to receive 
the high-order 11 bits of the output of register 352. 

20 The output of multiplexer 360 indicates the current page 
of memory access either in the left or right bank, and 
is provided to a "within page detector" (not shown) in 
the spryte engine 214 (Pig. 2). 

An optional memory management unit (not shown) may 

25 be inserted on the source mux out bus 370 prior to 
registers 344 and 352 and pad logic units 345 and 353. 

The address generator 208 also includes left mid- 
line request logic 362, which receives the LQSF signal 
from the left memory bank 108A, and receives an LTYPE 

30 signal, indicating whether the transfer currently taking 
place over the S-port in the left memory bank 108A is a 
video transfer or a CLUT list transfer. Left bank 
mid-line request logic 362 issues a mid-line split 
transfer request to the D-bus arbiter 210 at an 
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appropriate time depending on LTYPE. If LTYPE indicates 
that the S-port transfer then taking place is a CLUT 
list transfer, then the logic 362 dees not issue a split 
transfer mid-line request. If LTYPE indicates that the 
5 current S-port transfer is a video transfer, then the 
logic 362 issues the split transfer mid-line request 
just after each preceding half -page begins shifting out 
the S-port, that is, in response to each edge transition 
of LQSF. 

10 Similarly to the left bank mid-line request logic, 

address generator 208 also includes right mid-line 
request logic 364, which receives the RQSF signal from 
the right memory bank 108B, and receives an RTYPE 
signal, indicating whether the transfer currently taking 

15 place over the S-port in the right memory bank 108B is 
a video transfer or a CLDT list transfer- Right bank 
mid-line request logic 364 issues a mid-line split 
transfer request to the D-bus arbiter 210 at an 
appropriate time depending on RTYPE, similarly to the . 

20 timing of mid-line split transfer requests issued by 
left bank mid-line request logic 362. 

Fig. 4 is a block diagram of stack address logic 
336 (Fig. 3). Referring to Fig. 4, the DMA group 
address and the low-order seven bits of the source 

2 5 multiplexer 314 output are coupled to respective input 
ports of a multiplexer 402. The high-order five bits of 
the output of multiplexer 402 are provided to the input 
of a register 404, and the low-order two bits of the 
output of multiplexer 402 are coupled to the input of a 

30 register 406. Register 404 also receives a HOLD signal 
to a its chip enable. The seven bits held in registers 
404 and 406 are re-concatenated and provided to one 
input port of a multiplexer 408, the other input port of 
which is coupled to receive a 7-bit signal made up of 
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bits 6:3 from the concatenated word, and bits 2:0 of 
control. The output of multiplexer 408 is coupled to 
the input of another register 410, the chip enable of 
which is coupled to receive an AE signal. The output of 
5 register 410 forms the 7-bit write address for the DMA 
stack 312 (Fig. 3) . 

The 2-bit output of register 406 is also coupled to 
one input port of a multiplexer 412, the other input 
port of which is coupled to receive two bits of control 

10 information. The 2-bit output of multiplexer 412 is 
concatenated as low-order bits with the 5 -bit output of 
register 404, to form the 7-bit read address input to 
DMA stack 312. 

Fig. 7 is a block diagram of left address pad logic 

15 345 (Fig. 3). Write address pad logic 353 is identical. 
As shown in Fig. 7 , the 22 bits of the source mux out 
bus 370 are split into high- and low-order 11-bit 
portions , representing row and column portions of the 
address, respectively. The row and column portions are 

20 provided inside pad logic 345 to two respective input 
ports of a three-input, 11-bit wide multiplexer 702. 
The column portion of the source mux out bus 370 is also 
connected to one input port of a two- input, 11 -bit wide 
multiplexer 704. Similarly, the 22-bit output of D 

25 flip-flop 344 (Fig. 3) is split inside pad logic 345 
into high- and low-order 11-bit portions, representing 
the row and column portions of the address, 
respectively. The row and column portions are provided 
to two inputs of another three-input multiplexer 706, 

30 and to the column portions provided to the second input 
port of multiplexer 704. The output of multiplexer 704 
is connected to the data input of an 11-bit D flip-flop 
708, the output of which is connected to the third input 
port of multiplexer 702. The output of multiplexer 702 
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is connected to the third input port of multiplexer 706. 
The output of multiplexer 706 is coupled, via an 11-bit 
register 710, to the LA(10:0) output lines 110 for 
addressing the left bank of system memory 108. The left 
5 address pad logic 345 provides deglitching functions in 
the selection of the row or column portion of the 
address on the source mux out bus 370 for presentation 
to the address pins of system memory 108, 

Referring again to Fig. 3, address generator 208 

10 also includes a pre-f irst-lef t n logic 366 which 
determines, based on various system conditions, which 
bank of system memory 108 contains even or odd scan line 
pixel data at the current word address. Address 
generator 208 also contains a control unit 368 for 

15 generating the various control signals used in the 
address generator 208. All the registers shown in Figs. 
3 and 4 are clocked by a common clock signal operating 
at approximately 25MHz. 

The operation of address generator 208 will be 

20 described first with respect to CPU 102 memory access 
cycles. When the CPU interface 206 (Fig. 2) receives a 
memory access cycle from the CPU 102, it requests the 
D-bus and the address generator 208 from arbiter 210.- 
When granted, multiplexers 316 and 314 are controlled to 

25 select CPU-originated address lines A(23:2) onto the 
source mux output bus 370. This address is loaded into 
the registers 348 and 356 via multiplexers 346 and 354, 
respectively. If a new page (row) of system memory 
needs to be established, multiplexers 350 and 358 select 

30 the row address portion of the CPU address onto 
respective left and right bank address lines 110 and 
114, and appropriate RAS signals are strobed. A new 
page is always established for every new device which 
gains access to system memory. Respective column 
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addresses are then selected onto the left and right 
address buses 110 and 114, CASs are strobed, and data is 
read from or written to both banks 108A and 108B, to or 
from the CPU 102, simultaneously (see Fig. 1). 
5 If the CPU 102 determines that the next word address 

which it desires to access is either the same as the 
previous word address or the sequentially next higher 
word address, then it so indicates over its output lines 
Nrareq and seq. Address manipulator chip 106 recognizes 

10 this and, instead of waiting for the CPU 102 to actually 
drive the new address onto the address bus 104, forms 
the address itself in address generator 208. In 
particular, the current CPU word address is stored in 
register 318 (Fig. 3). In response to Nmreq and seq, 

15 control circuitry 368 causes adder/clipper 320 to pass 
this value either unchanged or incremented to the 
register 324, as appropriate, where it is stored on the 
next clock tick. Multiplexers 316 and 314 are then 
controlled to select the output of register 324 as the 

20 next address applied to the source mux out bus 370. 
Unless indicated by PBI detector 340, addresses 
generated sequentially by the address generator 208 do 
not require establishing a new page address in system 
memory 108. 

25 The operation of address generator 208 with respect 

to spryte engine read and write addresses provided by 
the spryte engine 214, is similar to the operation with 
respect to CPU 102 -originated addresses. Spryte engine 
addresses are multiplexed onto source mux out bus 370 

30 from the first input port of multiplexer 316, When the 
spryte engine has control of the D-bus and is providing 
addresses to the first port of multiplexer 316, all 
address calculations are performed by the spryte engine 
214. Adder/clipper 320 is not used. 
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Other than CPU-originated addresses and spryte 
destination addresses, all system memory accesses are 
performed by DMA using the DMA stack 312. 

The 128 22-bit registers in the DMA stack 312 are 
5 organized in groups, each group storing the information 
required to control a respective DMA "channel - . Each 
group is located at a respective fixed set of addresses 
in the DMA stack 312 , and each channel is predefined to 
control transfers from a particular source device to a 

10 particular destination device. Table I sets forth 
various information about each type of DMA address 
grouping. The "Interrupts Available" column indicates 
whether the channel can be programmed to interrupt the 
CPU 102 upon completion of a transfer. For some 

15 channels, such as the audio data channels controlling 
transfers to a DSP input FIFO, the channel can be 
programmed not to interrupt the CPU 102 until after the 
FIFO is emptied following completion of the transfer. 
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45 



50 



TART JP I 

Grouping No. Registers Looping Interrupts Transfers 
Type Groups Per Grp Available Available Controlled 



Channels 
Available 



Max DMA 

Burst 





S-Port Read 


1 


7 


No 


No 


CLUT control 


1 


4 words 




Transfers 










CLUTdisppath 


1 


1 word 














|*"f ITT* miri_1i*t«> 

v«ixUi nuu-iuic 


1 
i 


1 word 


10 












Video output 


1 


1 word 














Video mid-line 


1 


1 word 




S-Port Write 


2 


4 


Yes 


No 


Frame grabber 


1 


N/A 




Transfers 










Command 


1 


N/A 


15 


("Slipstream") 










grabber 








Refresh 


i 


1 


No 


No 


Sys mem refresh 


1 


4 words 




Audio Data 


18 


4 


Yes 


ies 


ays mem to uar 






20 












FIFO 


13 


3 words 












DSP FIFO to 


















sys mem 


4 


3 words 














Sys mem to 


















DSPEIRAM 


1 


4 words 


25 




















Expansion Bus 


2 


4 


Yes 


Yes 


Sys mem to/from 








Data 










expansion bus 


2 


4 words 




Player Bus 


I 


3 


No 


Yes 


Sys mem to/from 






30 


Data 










player bus 


1 


1 word 




Spryte Engine 


1 


8 


No 


No 


Spryte control 
















data 


1 


7 words 














PIP data 


1 


4 words 


35 












Spryte source 


















data start 


1 


1 word 














Spryte image data 


2 


4 word 




Coprocessor 


4 


4 


Yes 


Yes 


Sys mem to/from 






40 


Data 










decompression 


















coprocessor 


2 


4 word 














Sys mem to/from 


















other 


2 


4 word 



a. Audio Data Groups 

An audio data group consists of four DMA stack 
registers for holding a current address, a remaining 
length, a back-up address (next starting address) and a 
back-up length (next starting length). Typically prior 
to an audio data transfer to or from the audio subsystem 
in audio/video processor 140, the CPU 102 will write the 
starting address and desired length of transfer into the 
current address and remaining length registers for the 
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desired channel in DMA stack 312. These registers are 
all addressable by the CPU 102 as memory-mapped hardware 
registers. For transfers from system memory to an audio 
subsystem input FIFO, the address to be written into the 
5 DMA stack register for the desired grouping is the 
system memory source address. For data to be read from 
an audio subsystem output FIFO to system memory, the 
address to be stored in the DMA stack register for the 
grouping is the system memory destination address. The 

10 audio subsystem has 13 data input FIFOs and four data 
output FIFOs, each of which are referred to as a 
separate channel, and each of which corresponds to a 
different one of the audio data groupings in the DMA 
stack 312. Thus, the CPU 102 specifies a particular 

15 source or destination FIFO merely by choosing the 
corresponding audio data grouping in the DMA stack 312, 
For transfers from system memory directly to desired 
addresses in an EIRAM which is internal to the audio 
subsystem, the 32-bit words to be transferred from 

20 system memory are each divided into fields containing 
the data to be written to EIRAM (typically 16 bits 
wide) , the DSP internal EIRAM address to which data is 
to be written, and control information for the 
particular word transfer. 

25 When a current address is written by the CPU 102, 

the address generator 208 automatically sets the next 
starting address register for that group to zero in the 
next clock tick. When the CPU 102 writes a value to the 
remaining length register of a group, the address 

30 generator 208 automatically sets the next starting 
length register to zero in the next clock tick. These 
values of zero are used as control flags. The DMA 
operation is not allowed to be pointed at system memory 
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at address zero, nor is it allowed to have a length of 
zero. 

The audio subsystem is part of the audio/video 
processor 140, so all requests for DMA transfers reach 
5 the arbiter 210 over the single DMAREQ line previously 
described. This request includes the desired channel 
number. When the arbiter 210 grants control of the D- 
bus and address generator to the audio/video processor 
140 requestor, it provides the DMA group address for the 

10 desired channel to stack address logic 336 (Fig. 3) as 
previously described. The DMA group address is passed 
by multiplexer 402 (Fig. 4) to the registers 404 and 
406. The read address for DMA stack 312 is then 
assembled as the high-order five bits from register 404, 

15 and the low-order two bits from register 406. For audio 
data transfers, the low-order two bits are zero. 
Accordingly, the read address provided to the DMA stack 
312 now points to the first register in the 4-register 
grouping for the particular audio channel. 

20 The contents of this register, which represents the 

system memory starting address, are read out of the DMA 
stack 312 and selected by source multiplexer 314 onto 
the source mux out bus 370. The address is transmitted- 
via left and right address pad logic 345 and 353 to the 

25 left and right bank address buses of system memory 108, 
respectively, in a manner similar to that for CP0- 
originated addresses and spryte destination addresses 
described above. The external processor interface 220 
(Fig. 2) of address manipulator 106 drives the 

30 appropriate CCODE onto control bus 138 to indicate to 
audio/video processor 140 that the DMA transfer which it 
requested is now taking place. For transfers from 
system memory to the audio subsystem, system memory 108 
drives a word of data from the selected address onto the 
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D-bus 118, 120, and audio/video processor 140 latches in 
the data from D-bus 118, 120 and internally routes it to 
the requesting FIFO or EIRAM address. For transfers 
from the audio subsystem to system, memory 108, the 
5 audio/video processor chip 140 drives the data from the 
appropriate audio subsystem output FIFO onto the D-bus 
118, 120 and system memory 108 stores it at the selected 
address. 

In addition to being provided to the address lines 

10 of system memory 108, the address information on the 
source mux out bus 370 (Fig. 3) is also captured in 
register 318, incremented by adder/clipper 320, and 
written back to the same address in DMA stack 312 via 
multiplexer 310. The write address for DMA stack 312 

15 comes from register (Fig. 4) which previously derived it 
from the registers 404 and 406, via the lower input port 
of multiplexer 408. The read address for DMA stack 312 
is now formed by the high-order 5 bits from register 404 
(Fig. 4) and the low order 2 bits selected by 

20 multiplexer 412 from the control unit 368. At this 
time, control unit 368 provides '01 # of the low-order 2 
bits, since the length indication is stored in the 
second register in the group. DMA stack 312 is dual- 
ported, so reads and writes to different registers in 

25 the DMA stack 312 may be accomplished simultaneously. 
The length indication, which for audio groups indicates 
the number of words to be transferred, passes through 
source multiplexer 314 to register 318, and is 
decremented by adder /clipper 320 and written back to the 

30 same location in the grouping in DMA stack 312. 
Similarly to the writeback of the incremented starting 
address, the write address for the decremented length 
indication is provided by register 410 (Fig. 4), having 
previously been derived from register 404 and control 
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lines from control circuitry 368 via the upper input 
port of multiplexer 408. 

If the length as decremented is positive, then the 
address generator 208 causes the next data word to be 
5 transferred in the same manner as the first. Words are 
transferred in this manner until the maximum contiguous 
burst length is reached (either three or four words — 
see Table I), or the length indication as decremented is 
zero. 

10 Some time prior to the completion of this data 

transfer, if desired, the CPU 102 may write a backup 
address and a backup length into the third and fourth 
registers of the audio group. Typically, this is done 
before the data transfer begins. When the value in the 

15 remaining length register becomes zero, the access is 
completed and a "loop test" is performed. According to 
the loop test, if the next starting address register is 
zero, the DMA transfer has completed. If the next 
starting address value is non-zero, then it is copied to 

20 the current address register for the group and the next 
starting length value is copied into the remaining 
length register. Such copying occurs via the source 
multiplexer 314, register 318, adder/clipper 320, and 
multiplexer 310. The appropriate read and write stack 

25 addresses are generated by the control circuitry 368 
through the stack address logic 336. The address 
generator 208 then continues the DMA transfer in the 
same manner as set forth above with respect to the 
original transfers specified for the group. This 

30 process repeats until the CPU 102 writes a zero into the 
next starting address register of the particular audio 
grouping. 

DMA looping can be achieved in this manner. Looping 
is useful, for example, to send a cyclical series of 
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audio samples to the audio subsystem to have it generate 
a tone having a particular frequency. Looping action in 
a DMA transfer always causes a page break (causes 
address generator 208 to establish a new page address), 
5 whether or not the loop is successful. 

As can be seen from Table I, DMA transfers are 
usually in bursts of three or four. For transfers which 
involve the audio/video processor 140 (audio data 
transfers, expansion bus transfers and co-processor data 

10 transfers), as previously mentioned, address manipulator 
106 provides a CCODE on control bus 138 to inform the 
audio/ video processor 140 of the current status of a DMA 
process that is currently active. A CCODE is sent for 
each data word transferred, and the audio/video 

15 processor 140 responds to each such CCODE either by 
driving data onto the D-bus or by latching data from the 
D-bus. Additionally, one bit of information in the 
CCODE further indicates that the current length register 
decremented to zero during this burst, and that the DMA 

20 channel will now loop. The CCODE also contains a bit of 
information which indicates that the current length 
register decremented past zero during this burst, and 
the next starting address register is already at zero. 
This means that the DMA channel is not going to loop and 

25 the entire specified transfer is complete. 

B. Expansion Bus Data aiifl CO-PrOCgSSQC Pfltfl 

growings 

30 The DMA channels for data transfer between system 

memory 108 and the expansion bus 142, as well as the DMA 
channels for transfers between system memory 108 and 
either the decompression co-processor 166 or another 
external co-processor (not shown), all operate similarly 

3 5 to the audio data channels. Looping is performed in a 
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similar manner, and since the expansion bus interface is 
a requestor from within the audio/video processor 140, 
the CCODEs generated by address manipulator 106 are also 
the same. The decompression co-processor 166 is not 
5 strictly within the audio/video processor 140, but it 
makes its DMA requests through the DMA requestor in 
audio/video processor 140 and is therefore handled in 
the same manner. The same is true with respect to an 
external co-processor. The decompression co-processor 
10 166 and the external co-processor are coupled to the 
control bus 138 and therefore receive CCODEs directly. 



c. s-Port Trgngfers 

The DMA stack 312 contains one 8-register group 
15 (only seven of which are used) to control read transfers 
out the S-port of VRAM in system memory 108. The S-port 
transfers themselves do not require control of the D-bus 
or the address generator 208, but S-port activity can be 
controlled only via commands issued over the D-bus. The 
20 registers in the group are set forth in Table II. 

TfrBIjE II 

0 Current CLUT Address 

25 1 Next CLUT Address 

2 CLUT Mid-Line Address 

3 — 

4 Previous Line Video Address 

5 Current Line Video Address 

30 6 Previous Line Mid-Line Address 

7 Current Line Mid -Line Address 

In order to coordinate control of the video display 
35 path with the display scanning operation, the system of 
Fig* 1 transmits all of such commands down the display 
path during an allocated portion of each horizontal 
blanking period. In particular, about 50 words of 
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transfer time are allotted during each horizontal 
blanking period. These commands are mostly directed to 
the color look-up table (CLUT) , thereby permitting the 
CLUTs (there are three CLUTs for a scan line — one for 
5 each primary color) to be updated each scan line. The 
use of the commands ("color words") by the CLUTs, and 
the structure of the CLUT system, are described in the 
related METHOD AND APPARATUS FOR UPDATING A CLUT DURING 
HORIZONTAL BLANKING application. Other commands 

10 ("control words") are directed to the interpolation 
mechanism, described in the related RESOLUTION 
ENHANCEMENT FOR VIDEO DISPLAY USING MULTI-LINE 
INTERPOLATION application. Still other control words 
are directed to the audio/video output circuitry 152 

15 (Fig. 1), and are passed by the audio/video processor 
140 to audio/video output circuitry 152 over the AD bus 
158. Note that in another embodiment, other otherwise 
unused slots on the S-bus may be used to transmit 
commands down the video display path, such as during 

20 start-up and/or during vertical blanking. 

Control registers in audio/video output circuitry 
152 are write-only, and are handled completely in the 
CLUT list transfer process. There is no direct access, 
from the CPU 102 to the registers in audio/video output 

25 circuitry 152, nor can any of such registers be read by 
the system. 

The control words to be transmitted down the video 
display path during the allocated portion of the 
horizontal blanking period are prepared in advance by 
30 the CPU 102 in the form of a linked list set up by the 
CPU in VRAM. Although the control words are not always 
intended for the CLUTs, this list is sometimes referred 
to herein as a CLUT list. A CLUT list contains four 
mandatory control words, optionally followed by a 
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sequential list of up to about 50 command words. The 
structure of the words in the GLUT list is as follows: 

CLUT DMA control word 

5 

3 bits select a display mode (320, 384, 512 or 1024 
pixels per scan line) 

1 bit enables Slip Stream capture during horizontal 
10 blanking 

1 bit enables the video DMA function 

1 bit selects the frame-grabber or the command- 
15 grabber DMA slipstream channel 

1 bit sets a video mode for the upcoming scan lines 
to indicate whether 240 or 480 pixels will be 
provided 



20 



40 



45 



1 bit indicates whether the "next CLUT list 
address" is absolute or relative 



1 bit specifies whether the "previous line video 
25 address" for subsequent scan lines is to be 

calculated by adding a modulo or by re-using 
each previously used "current line video 
address" 

30 1 bit indicates the validity of the "current line 

video address" below (0 means re-use the 
prior "current line video address") 

1 bit indicates the validity of the "previous line 
35 video address" below (0 means re-use the 

prior "previous line video address") 



6 bits indicate the length in words of this list 
(minus 4) 

9 bits indicate the number of scan lines to wait 
before processing the next CLOT list 

Current Line Video Address 

Physical address from which to begin fetching 
"current" line pixel data after processing this CLOT 
list. 
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Previous Line Video Address 

Physical address from which to begin fetching 
"previous" line pixel data after processing this 
5 CLOT list. 

Next CLUT List Address 

Address from which the next CLUT list should be 
10 fetched, after the number of scan lines specified in 

the CLUT DMA control word have been transmitted. 

Immediately following the above four control words 
is a contiguous sequence of up to 50 color and/or 

15 control words. Color words are indicated by a zero in 
bit 31. Eight bits of the word carry a red color value, 
eight bits carry a green color value, and eight bits 
carry a blue color value. Five bits of the word 
indicate the CLUT address to which this color data is to 

20 be written (the pen number), and two bits indicate 
whether to write red only, green only, blue only or all 
three colors at the indicated pen. Accordingly, by 
appropriately preparing the CLUT list, it is possible 
for the software running on CPU 102 to update individual 

25 color values for each of the 32 pen numbers in each of 
the three colors represented in the CLUT. 32 color 
control words are required to update the entire CLUT. 
As explained in more detail in the related METHOD AND 
APPARATUS FOR UPDATING A CLUT DURING HORIZONTAL BLANKING 

30 application, the CLUT which is modified by these color 
control words is the CLUT which is used to translate 
"current line" pixel data. The CLUT which is used to 
translate "previous line** pixel data is always updated 
in its entirety from the "current line" CLUT after each 

35 scan line. 

If bit 31 of a color/control word is one, and if bit 
30 is zero, then the word contains control information 
for the audio/video output circuitry 152. The audio/ 
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video processor circuitry 140 receives this word over 
the S-bus 122, and forwards it to the audio/video output 
circuitry 152 for processing. 

If bits 31 and 30 are both one, and if bit 29 is 
5 zero, then the word is a display control word and 
contains the following information: 

pjisplsY control worfl 

10 1 bit forces audio/video processor 140 to send a 

null control word to audio/video output 
circuitry 152 



15 



30 



1 bit selects the NTSC or PAL transmission standard 

1 bit enables CLUT bypass — see related application 

1 bit enables a ° surprise" 640 display mode 

20 1 bit forces transparency always, letting overlay 

data be displayed from slipstream capture 

1 bit enables the background color detector in the 
display path to indicate transparency 

25 

1 bit enables random number generator for the three 
LSBs of CLUT bypass 

1 bit enables a window MSB replication, gate 

1 bit requests a single line disable of vertical 
interpolation 

1 bit swaps the meaning of the horizontal and 
35 vertical subposition bits 

2 bits select the vertical subposition bit source 

from: 0, 1, use frame buffer bit, and 
maintain prior setting 

40 

2 bits select the horizontal subposition bit source 
from: 0, 1, use frame buffer bit, and 
maintain prior setting 

45 2 bits select the blue pen LSB source from: 0, use 

frame buffer data bit 0, use frame buffer 
data bit 5, and maintain prior setting 
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1 bit enables vertical interpolation 

1 bit enables horizontal interpolation 

5 9 bits have the same meaning with respect to a 

window within the display field , as the last- 
mentioned 9 bits have with respect to the 
remainder of the display. The window is 
described in the related APPARATUS AND METHOD 
10 FOR UPDATING A CLUT DURING HORIZONTAL 

BLANKING application. 

If bits 31, 30 and 29 of a color/control word are 

15 all one, then the word contains three 8-bit color fields 
(red, green and blue) for writing to the "background" 
pen of the CLUTs. 

During initialization, the CPU 102 writes the 
address of a "top of field" CLUT list into register 1 

20 (next CLUT address) of the S-port read transfer group 
in DMA stack 312. If enabled, the top of field CLUT 
list is initiated every field by the CLUT control 
circuitry near the end of scan line 5 (or 4, depending 
on which field is being generated) . To initiate the 

25 action, S-port control circuitry 224 of the address 
manipulator chip 106 (Fig. 2) issues a request to 
arbiter 210. When granted, the arbiter 210 transmits 
the DMA group address for S-port read transfers to the 
stack address logic 336 in the address generator 208. 

30 Stack address logic 336 generates the stack address of 
the 8-register S-port read grouping to the read address 
input of DMA stack 312, and source multiplexer 314 
drives the top of field CLUT address from register 1 of 
that DMA stack register grouping onto the source mux out 

35 bus 370. The top-of-field CLUT address is also copied 
into register 0 of the group to indicate the current 
CLUT address. In the manner previously described with 
respect to audio data transfers, the top of field CLUT 
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address is provided to the left and right memory banks 
108A and 108B, which in turn drive the first word of the 
GLUT list (i.e./ the DMA control word), onto the D-bus 
118, 120. The address manipulator chip 106 receives the 
5 word onto its MDT data bus 202 via the buffers 222 (Fig. 
2) and provides it to the S-port control circuitry 224. 
Appropriate bits are set in the S-port control unit 224 
in response to the settings in the control word. 
Additionally, the CLUT list length indication from the 
10 control word is loaded into a word counter (not shown), 
and the number of scan lines to wait before processing 
the next CLUT list is loaded into a scan line counter 
(not shown) . 

Address generator 208 automatically increments the 

15 system memory read address from register 0 of the group 
in the manner previously described, using adder/clipper 
320, register 324 and multiplexer 316 providing the 
output of register 324 through the source multiplexer 
314 back onto the source mux out bus 370. System memory 

20 108 accordingly then drives the second word of the CLUT 
list onto the D-bus 118, 120. This value represents the 
address from which to begin fetching "current" line 
pixel data for the current scan line. Referring to Fig. 
3, bits 23:2 of this address are taken from the MDT data 

25 bus 202 and provided to the DMA stack input port 312 via 
multiplexers 308 and 310. The address generator control 
circuitry 368 provides appropriate control signals to 
the stack address logic 336 (Fig. 4) such that this 
information will be written into register 5 of the 

30 S-port read transfer group, the current line video 
address register. 

The address generator 208 then increments the 
address provided to system memory 108 again, such that 
system memory 108 drives the third word of the CLUT list 
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onto D-bus 118, 120. This information, representing the 
address from which to begin fetching "previous" line 
pixel data, reaches the data input port of DMA stack 312 
in the same manner as did the current line video 
5 address. Address generator control circuitry 368 
controls the stack address logic 336 to provide the 
address of register 4 of the S-port read transfer group 
to the write address input of the DMA stack 312, so that 
this information may be written to register holding the 

10 previous line video address. In a similar manner, the 
fourth word of the CLUT list, representing the address 
of the next CLUT list, is written into register 1 of the 
S-port read transfer register grouping of the DMA stack 
312, overwriting the top-of-field CLUT list address. 

15 After these four transfers take place, if the CLUT 

DMA control word indicated a non-zero number of 
color/display path control words to follow, address 
generator 208 initiates a CLUT list display path 
transfer. In particular, address generator 208 drives 

20 the value in current CLUT address register 0 of the S- 
port read transfer grouping in DMA stack 312, onto the 
left and right address buses of system memory 108, and 
S-port control circuitry 224 drives appropriate control 
signals to both banks of VRAM to cause a full read 

25 transfer of the page of the specified address into the 
static holding register. (See the above-mentioned NEC 
Datasheets.) The S-port control circuitry 224 also 
loads the column address into the left and right bank 
VRAM as the starting tap for shifting purposes. Address 

30 generator 208 also at this time uses the adder/clipper 
320 to calculate the next page address which will have 
to be written to the VRAMs, and stores this in register 
2 ("CLUT mid-line address") of the S-port read transfer 
address grouping in DMA stack 312. The S-port control 
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circuitry 224 then issues the appropriate number of 
serial clocks to the left and right banks of VRAM in 
system memory 108, to transmit the entire list of 
color/display path control words over the S-bus and down 
5 the video display path. During this time the D-bus is 
free to service other requestors. 

After the GLUT mid-line address is calculated, a 
GLUT mid-line request is issued automatically to the 
D-bus arbiter 210, whether or not it is needed. Note 

10 that a second mid-line request for a GLUT transfer will 
never be needed since a CLUT list can never be more than 
64 words long whereas a half -page of the VRAM is 256 
words long. When the D-bus is granted for the automatic 
CLUT mid-line request, address generator 208 provides 

15 the CLUT mid-line address from register 2 of the S-port 
read transfer grouping, for issuing the split read 
transfer command to the VRAMs . S-port control circuitry 
224 continues issuing serial clocks to the left and 
right banks of VRAM until all the words of color/display 

20 path control have been transmitted. The D-bus and 
address generator 208 are available for other uses 
during this time. 

Shortly before the end of the horizontal blanking 
period, S-port control circuitry 224 issues a start 

25 video request to the arbiter 210. When granted, the 
address generator 208 latches the "previous line video 
address" from register 4 of the S-port read transfer 
grouping, into the registers 348 and 356. PFL logic 366 
determines whether the previous line pixel data will 

30 come from the left or right bank of VRAM, depending on 
various system conditions. In either case, S-port 
control circuitry 224 applies the appropriate signals to 
the appropriate VRAM to have an initial full read 
transfer performed and the appropriate column number 



WO 94/10641 W W PCT/US92/09349 



- 60 - 

written into the VRAM as a starting tap. Similarly, the 
"current line video address" from register 5 of the S- 
port read transfer grouping is used to perform an 
initial full read transfer and to set the starting tap 
5 of the opposite bank of VRAM. The address generator 208 
also at this time calculates the next half-page address 
for previous line pixel data and stores it in register 
6 of the S-port read transfer grouping, and calculates 
the next half -page address for current line pixel data 

10 and stores it in register 7 of the S-port read transfer 
grouping in DMA stack 312, The D-bus and address 
generator 208 are then released to service other 
requestors, and S-port control circuitry 224 issues the 
proper serial clocks to the VRAMs to transmit previous 

15 and current line pixel data over the S-bus and down the 
video display path. 

Previous and current mid-line requests are issued in 
response to LQSF and RQSF with the previous and current 
line mid-line addresses stored in registers 6 and 7 of 

20 the S-port read transfer grouping in DMA stack 312 being 
updated each time with an incremented half-page address. 

After the number of horizontal scan lines specified 
in the CLUT DMA control word complete, and when the 
portion of the horizontal blanking period allocated to 

25 GLUT list transfers arrives, S-port control unit 224 
issues a request to arbiter 210 for control of the D-bus 
and address generator 208. When granted, the S-port 
transfer of the next CLUT list begins in the same manner 
set forth above. If the number of scan lines to wait 

30 before loading the next CLUT list is zero, then S-port 
control 224 no longer checks for new transfer requests 
until the next "top of field" occurs. The top of field 
CLUT list transfer will take place beginning with the 
address specified in register 1. 
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D. DMA Stack Regis ter Grouping for System Memory 
Refresh 

5 The refresh address is held in a, single DMA stack 

location. Whenever the horizontal scan line count 
reaches a predefined pixel position, four requests are 
made to arbiter 210 for system memory refresh 
operations. The four requests are independent and are 

10 handled as separate requests. When a refresh request is 
granted, arbiter 210 provides the DMA group address of 
the refresh counter to stack address logic 336, which 
reads the current refresh address onto the source mux 
out bus 370. The address is loaded into both registers 

15 348 and 356 and the left and right pad address logic 345 
and 353 each select the low-order bits for driving onto 
the address buses 110 and 114. Address generator 208 
then issues the appropriate RAS and CAS signals to both 
memory banks to thereby refresh the system memory page 

20 pointed to by the low-order bits of the refresh address. 
The refresh address is then incremented by adder/clipper 
320 and written back into the refresh address register 
in DMA stack 312. 

25 E. Player Bus DMA Register Grouping 

The register grouping in DMA stack 312 for transfers 
between system memory and the player bus contains three 
registers: the system memory write address, the system 
memory read address, and the length of transfer in 

30 words. As previously mentioned the player bus uses a 
single stream serial protocol that is run at a rate of 
once per display field. Details of the player bus and 
the serial protocol may be found in the related PLAYER 
BUS APPARATUS AND METHOD application. When the display 

35 is on scan line 17, if the player bus DMA channel is 
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enabled, player bus control 504 (Fig. 5) makes a request 
to arbiter 210 for control of the D-bus and the address 
generator 208. When granted, the address generator 208 
will first read the contents of the .32 -bit player bus 
5 shift register 502 and transfer it to the address in 
system memory 108 specified by the system memory write 
address register in the player bus grouping of the DMA 
stack 312. Note this first word contains information 
shifted in at the end of the previous stream and is 

10 therefore irrelevant and should be ignored by the 
software. The address generator 208 then transfers a 
32-bit word from the system memory 108 address specified 
by the system memory read address register in the player 
bus grouping of DMA stack 312, to the shift register 

15 502. Address generator 208 also increments the 
addresses in the system memory write address register 
and system memory read address register of the player 
bus grouping in the DMA stack 312, and decrements the 
length register in such grouping. The player bus 

20 control unit 504 then simultaneously shifts out the 32 
bits from the shift register 502 onto the player bus, 
and shifts in 32 new bits of data from the player bus 
into the shift register 502. At the completion of the 
shift, player bus control unit 504 again requests 

25 control of the D-bus and address generator 208 for 
another pair of DMA transfers to and from system memory 
108. Accordingly, the DMA address and length registers 
for the player bus channel operate in the same fashion 
as those for the audio DMA channels described above, 

30 except that looping is not enabled. The size of the 
shift register 502 is only one word, so the transfer 
burst size in each direction is only one word. 

Note that software running on CPU 102 needs to re- 
enable the player bus DMA channel once each display 
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field. This takes place automatically when the CPU 102 
writes a value into the length register of the player 
bus register grouping in DMA stack 312. The channel is 
automatically disabled when the length register reaches 
5 zero. The channel may also be programmed to interrupt 
the CPU 102 when the length register reaches zero. 

The useful length of the output stream and the input 
stream on the player bus are not directly related. 
Either one may be longer. Since one clock is used for 

10 both streams, non-valid input data could arrive and must 
be ignored. Additionally, the output stream may need to t 
be padded with leading zeros. Also, note that 
disablement of the player bus DMA channel and 
interruption of the CPU 102 when the length register 

15 decrements to zero, merely indicates that all necessary 
DMA transfers have taken place; player bus control 504' 
still must shift the word most recently transferred to 
shift register 502, out the player bus. 

20 F. Spryte En gine Data Transfers 

In order to render a spryte into an area of system 
memory 108, the CPU 102 first sets up the required data 
in a different area of the system memory 108. Such data 
includes 6 to 15 32-bit words as specified in Table II 

25 above, all located contiguously in system memory 108; 4, 
8 or 16 optional 32-bit words contiguously located in 
system memory 108 to represent PIP data; and spryte 
image data of any length. Of the 6 to 15 words 
specified in Table II, note that several groups are 

30 optional as set forth more fully below. Note also that 
the second word of the SCoB is a pointer to the next 
SCoB to process; thus, the CPU may create a linked list 
of any number of SCoBs to process in sequence, each 
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defining its own spryte source data, optional PIP data, 
and spryte rendering control information. 

Also prior to starting the spryte engine , the CPU 
102 writes desired information directly into certain 
memory-mapped hardware registers as follows: 

SCOBCTLO . 32-bit word defined in Table III below. 



10 



TABLE HI 

SCOBCTLO E ngine Cnntrnl Word 



15 



20 



25 



30 



35 



Bits 

B31:B30 



B15POS 



B29-328 = B0POS 



Description 

B15 oPEN selector for output of PMPP. (This bit can function as 
a supposition defining bit that is used by the pre-display 
interpolater.) 0=0, 1=1, 2=xx, 3= same as Source data 

B0 oPEN selector for output of PMPP. (This bit can also function 
as a subposition defining bit that is used by the pre-display 
interpolater.) 0=0, 1=1, 2= PPMP math, 3= same as Source data 



B27 = SWAPHV l=Swap the H and V subpositions prior to their entry into the 
PPMP 

B26 = ASCALL 1= Allow super clipping function (master enable switch) 
B25 = xx Reserved 

£24 » CFBDSUB l=use the H and V subposition bits of the cFB data in place of 
(vice) the spryte source values when the cFB data is selected as a 
PPMP source. (Note: CFBDsel=(Sl=l) OR (S2=2).) 



CFBDLSB cFBD PPMP Blue LSB source. 0=0, l=cFBD[B0], 2= cFBD[B4], 
3=x 



B23:B22 

B21S20 = IPNLSB IPN PPMP Bhic LSB source, 0=0, 1-IPN[B0], 2-IPN[B4], 3=x 



40 Note that when 'relative' has been specified in the 

flags for NEXTPTR, SOURCEPTR, or PIPPTR, the value that 
should (must) be placed in the SCoB is the word distance 
from the address in RAM that has the relative value in 
it to the address in RAM that is desired to be the new 

45 address MINUS POUR. (REL= Target - PC - 4 } . Also note 
that the B0POS value of *2' is the only setting that 
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uses PPMP math to control the BO bit in the actually 
output OPEN signal. When this setting is chosen, the 
Blue LSB will also be included in the input parameters 
of the black detector. 
5 REGCTLO . Controls the modulo for reading source 

frame buffer data into the primary and/or secondary 
input port of the spryte engine 214 and for writing 
spryte image result data from the spryte engine 214 into 
a destination frame buffer in system memory 108. The 

10 modulo effectively indicates the number of pixels per 
scan line as represented in the respective frame buffer 
in system memory 108. 

Only the low-order 16 bits of REGCTLO are used. 
The low- order 8 bits specify the modulo for the source 

15 frame buffer , and the next 8 bits specify the modulo for 
the destination frame buffer. For each of the two 
modulo designations , the low-order 4 bits specify a Gl 
value and the high-order 4 bits specify a G2 value. The 
modulo specified for a particular buffer is then 

20 calculated as G1+G2. Thus the following bits of REGCTLO 
are defined (CFBD refers to current frame buffer data, 
a source buffer separate from spryte source data, which 
the spryte engine may read as input data) : 
ftPggTfrQ PTT DESCRIPTION 

25 



30 



35 



0 


Gl=32 


for 


CFBD read buffer. 


1 


Undefined. 


Set to 0. 


2 


Gl=256 


for 


CFBD read buffer. 


3 


Gl-1024 


for 


CFBD read buffer. 


4 


G2-64 


for 


CFBD read buffer. 


5 


G2=128 


for 


CFBD read buffer: 


6 


G2=256 


for 


CFBD read buffer. 


7 


Undefined. 


Set to 0. 


8 


Gl=32 


for 


destination buffer. 


9 


Undefined. 


Set to 0. 


10 


Gl=256 


for 


destination buffer. 


11 


Gl-1024 


for 


destination buffer. 


12 


G2=64 


for 


destination buffer. 


13 


G2=128 


for 


destination buffer. 
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14 G2=256 for destination buffer. 

15 Undefined. Set to 0. 

The software must ensure that no more than one bit 
5 in each nibble is set. The hardware does not protect 
against setting more than one bit. If no bits are set 
in a nibble, the contribution to the resultant modulo is 
zero. 

REGCTL1 . X and Y clip values, effectively 

10 indicating the number of pixels in the X and Y 
dimensions which make up the frame buffer. Bits 26:16 
indicate the last writeable row (counting from row 0) in 
the Y dimension and bits 10:0 indicate the last 
writeable column (counting from col. 0) in the X 

15 dimension. All other bits must be zero. As an example, 
a value of 00EF013F indicates that the frame buffer data 
is represented in 320x240 format. 

REGCTL2 . Read base address. Indicates the address 
in system memory 108 of the upper left corner pixel of 

20 the source frame buffer data. 

REGCTL3 . Write base address. Indicates the 
address in system memory 108 of the upper left corner 
pixel of the destination frame buffer (CFBD). 

Also before the spryte engine is started, the CPU 

25 102 places the address of the first SCoB in the linked 
list into the DMA stack 312 register corresponding to 
"next SCoB fl . The CPU then writes to the memory mapped 
address designated SPRSTRT in order to start the spryte 
engine running. Once the spryte engine starts running, 

30 it retains exclusive control of the D-bus until either 
it finishes processing all the SCoBs in the list, or an 
interrupt occurs. If an interrupt occurs, the spryte 
engine continues to a convenient stopping point, then 
releases the D-bus. The CPU then vectors to an 

35 appropriate interrupt handler, and when done, returns to 
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the routine which originally started the spryte engine. 
That routine checks the status bit SPRON in a memory 
mapped status bit register to determine whether the 
spryte engine stopped due to an interrupt or due to 
5 completion of processing, and if the former, restarts 
the spryte engine. In an alternative embodiment, the 
CPU can have a separate bus to program memory, to 
thereby permit the CPU 102 to perform other tasks while 
the spryte engine 214 renders the sprytes. In the 

10 latter embodiment, the spryte engine can generate an 
interrupt for the CPU 102 when spryte processing has 
completed, at which time the CPU 102 can vector to an 
interrupt handler. 

The DMA stack 312 includes an 8-register grouping 

15 for spryte control. The eight registers are as follows: 

0 CURRENT SCOB ADDRESS 

1 NEXT SCOB ADDRESS 

2 PIP ADDRESS 

3 SPRYTE DATA ADDRESS 

20 4 ENGINE A FETCH ADDRESS 

5 ENGINE A LENGTH 

6 ENGINE B FETCH ADDRESS 

7 ENGINE B LENGTH 

25 When the CPU writes to the SPRSTRT address, after 

the spryte engine gains control of the system data bus 
118, 120, the DMA engine of Figs. 3 and 4 loads in the 
first six words from the first SCoB beginning from the 
system memory address specified in the NEXT SCOB 

30 register in the DMA stack 312. To accomplish this, the 
address of the first word to load is read out of the 
NEXT SCOB register and provided to the memory address 
lines via source multiplexer 314. The address is also 
incremented by adder/clipper 320 and written back via 

35 multiplexer 310 into the CURRENT SCOB register in DMA 
stack 312. All six words are read from memory in this 
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manner , the CURRENT SCOB register maintaining the 
address of each next word to load. 

The first SCoB word read, FLAGS, is written into a 
32-bit hardware register in address, manipulator chip 
5 106. The next SCoB word read, NEXTPTR, is written into 
the NEXT SCOB register in DMA stack 312. SOURCEPTR is 
written into the SPRYTE DATA ADDRESS register of DMA 
stack 312, and PIPPTR is written into the PIP ADDRESS 
register in DMA stack 312. XPOS and YPOS are written to 

10 two memory mapped hardware registers XYPOSH and XYPOSL, 
each having the format of x,y. That is, the high-order 
16 bits from XPOS and the high-order 16 bits from YPOS 
are written to the high- and low-order half words, 
respectively, of XYPOSH , and the low-order 16 bits of 

15 XPOS and the low-order 16 bits of YPOS are written to 
the high and low half words, respectively, of XYPOSL. 

After the first six words of the SCoB are loaded, 
depending on the bits which were set in FLAGS, up to 
seven additional SCoB words are loaded. The possible 

20 words are grouped as a single DMA transfer of up to 
seven words. If the LDSIZE bit of FLAGS was asserted, 
then the DMA controller of Figs. 3 and 4 expects the 
first four words of this group of seven to be DX, DY, 
LINEDX and LINEDY. These words are loaded in the same 

25 manner as the first six words of the SCoB, the 
incremented addresses being stored in the CURRENT SCOB 
register in DMA stack 312. DX and DY are written in x,y 
format into two memory mapped hardware registers DXYH 
and DXYL, and LINEDX and LINEDY are written in x,y 

30 format to two memory mapped hardware registers DDXYH and 
DDXYL. Note that if the SKIP bit of the FLAGS word 
equals one, indicating that the present SCoB is to be 
skipped, or if the YOXY bit is zero, then the X and Y 
values are not written to the hardware. 
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If the LDPRS bit of FLAGS was set, then the DMA 
control unit of Figs. 3 and 4 expects the first two 
words (or the next two words) of the optional seven to 
contain DDX and DDY. These are written in x,y format to 
5 two memory mapped hardware registers DDXYH and DDXYL. 

If the LDPPMP bit of the FLAGS word was set, then 
the DMA control unit of Figs. 3 and 4 expects the first 
(next) word of the optional seven words to be PPMFC. 
This word is written to a memory mapped PPMPC hardware 

10 register. 

After the second SCoB load of zero through seven 
words, the DMA control unit of Figs. 3 and 4 performs a 
preamble load of either one or two words. If the 
SCOBPRE bit of FLAGS was set, then the preamble word(s) 

15 is (are) assumed to be at the end of the SCoB, in which 
case the CURRENT SCOB register in DMA stack 312 contains 
the address of the first preamble word. If SCOBPRE was 
not set, then the preamble word(s) is (are) assumed to 
be at the start of the data, in which case the SPRYTE 

20 DATA ADDRESS contains the address of the first preamble 
word. The DMA control unit selects the appropriate 
register source for the starting address of the preamble 
load and returns the incremented addresses to the same 
register. 

25 The first preamble word is always present and is 

loaded into the appropriate hardware registers. The 
second preamble word is present only when the PACKED bit 
of the FLAGS word was zero, indicating that the spryte 
image data is in "totally literal" format. When the DMA 

30 unit reads this word, the information in the WOFFSET 
field is written to an offset register and the 
information in the TLHPCNT field is written to a pixel 
count register in the hardware. The offset indicates 
the width of the totally literal spryte in memory, and 
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is used by the DMA controller to calculate the start of 
each next line of the spryte source data. The pixel 
count indicates the number of pixel6 to be transferred 
on each scan line of totally literal spryte source data. 
5 These two values are settable independently in order to 
permit the transfer of only a rectangular portion of a 
larger bit image, smaller than the overall bit image 
both in width and height. 

After the preamble load, if the LDPIP bit of the 

10 FLAGS word was set, the DMA control unit will read out 
4, 8 or 16 words of PIN to IPN conversion information, 
beginning from the address in the PIP ADDRESS register 
in the DMA stack 312. Incremented addresses are also 
rewritten into the same DMA stack register. The number 

15 of 4-word bursts to perform (if any) depends on the data 
compression format of the spryte image source data, 
which is specified in the BPP ("bits per pixel") field 
of the first preamble word. In particular, as set forth 
in Table VIII of the related METHOD FOR CONTROLLING A 

20 SPRYTE RENDERING PROCESSOR application, a total of four 
PIP words will be loaded for BPP = 0, 1 or 2; eight PIP 
words will be loaded if BPP = 3; and 16 PIP words will 
be loaded if BPP =4, 5, 6 or 7. Note that since PIP. 
data in system memory 108 is referenced indirectly, the 

25 same PIP data may be downloaded from multiple SCoBs. 
Also, the PIP may be loaded even if it will not be used 
to decompress the current spryte image source data 
(which is the case if the LINEAR bit of the first 
preamble word is one). For loads of all 16 PIP words, 

30 the entire PIP is overwritten. For loads of fewer than 
16 words, the PIPA field of the FLAGS word indicates the 
starting address in the PIP for receiving the data. 

After the PIP load, the DMA unit of Figs. 3 and 4 
begins transferring spryte image source data from system 
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memory 108 to the data input FIFOs of the spryte engine 
214. As set forth above, if the spryte image source 
data is packed (i.e. not in totally literal format), 
then at the beginning of each scan line of spryte source 
5 data, the first one or two bytes of the first word 
contain a word offset from the start of the current line 
of source data to the start of the next line of source 
data, minus 2. This offset is used by the DMA 
controller both to calculate the start of each next line 

10 of source data and to set the length of a DMA transfer 
of the line of source data. Accordingly, the DMA 
controller reads this word from the address specified 
in the SPRYTE DATA ADDRESS register of the DMA stack 
312, incrementing the address and placing it into the 

15 ENGINE A FETCH ADDRESS register of the DMA stack 312. 
The high-order 8 or 10 bits of this word are placed into 
the ENGINE A LENGTH register of the DMA stack 312, and 
the entire word is also sent to the spryte engine data 
input FIFO for corner engine A. The spryte engine knows 

20 to ignore this word. Note that if the spryte image 
source data is in totally literal format, then the 
single offset value (as well as the pixel count value) 
specified in the second preamble word and described 
above applies to the entire spryte. 

25 After the offset is loaded, the DMA controller then 

transfers additional words of spryte image source data 
for the current scan line in bursts of up to four words 
each, as requested by the spryte engine 214. The 
starting address for each burst is found in the ENGINE 
.30 A FETCH ADDRESS register of DMA stack 312, and the 
incremented addresses are placed in the same register. 
Correspondingly, the DMA controller decrements the value 
in the ENGINE A LENGTH register of DMA stack 312 
according to the number of words transferred. 



WO 94/10641 M M PCT/US92/09349 



- 72 - 

The DMA controller also updates the SPRYTE DATA 
ADDRESS register in DMA stack 312 by adding the offset 
specified in the first word of the scan line, so that 
that register always points to the next scan line to be 
5 processed. 

Note that on the conditions described in the above- 
mentioned SPRYTE RENDERING SYSTEM WITH IMPROVED CORNER 
CALCULATING ENGINE AND IMPROVED POLYGON-PAINT ENGINE 
APPLICATION , the spryte engine 214 can also request DMA 

10 transfers of 4-word bursts of the next scan line of 
spryte image source data into the corner engine B data 
input FIFO, using the ENGINE B FETCH ADDRESS and ENGINE 
B LENGTH registers. In this manner, both corner engines 
A and B in the spryte engine 214 can operate on 

15 different scan lines of spryte image source data 
simultaneously , and the DMA controller can burst data to 
each as independently requested. 

VII. General O peration of the System 

20 On release from reset, the address manipulator chip 

106 maps the CPU ROM 132 into the bottom of the CPU 102 
address range (beginning at address zero) . This range 
is read only until it is remapped to system memory space 
during the ROM start-up procedure. The ROM start-up 

25 procedure clears all necessary hardware conditions to a 
predefined state/ determines the size of system memory 
108 and sets the appropriate hardware bits, and performs 
various hardware initialization processes. The start- 
up ROM code will also copy ROM code to RAM and instruct 

30 the address manipulator chip 106 to remap the address 
space accordingly. 

When a media change event occurs at one of the 
expansion bus devices, such as the insertion of a new CD 
into CD/CD-ROM player 148, the device asserts a signal 
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on expansion bus 142 to expansion bus interface 614. 
Expansion bus interface 614 sets some hardware bits 
which have the effect of preventing any commands issued 
by the hardware from reaching the device which had the 
5 media change. Only the CPU 102, running in a supervisor 
mode, can clear these bits, and it does so as part of a 
soft reset. An interrupt is sent to the CPU 102 , which 
begins executing a media change routine in the 
supervisor mode from CPO ROM 132. This routine executes 

10 the soft reset to clear the hardware bits. After 
release from the soft reset, the CPU continues executing 
the media change routine by accessing the media to 
determine format and validity. For example, if the CPU 
102 determines that a new CD inserted into CD/CD-ROM 

15 player 148 is in standard CD/audio format, then the CPU 
102 jumps to a routine which automatically causes the CD 
to be read, passed through audio subsystem 606, 
converted to analog form by audio/video output circuitry 
152, and driven onto the audio left and right output 

20 lines (Pig. 1). Alternatively, if the CPU 102 
determines that the newly inserted CD is in a standard 
photo-CD format, then it may jump to a routine which 
reads it into system memory 108 and writes it out via 
the S-bus 122, the video display path 618, and 

25 audio/video output circuitry 152, to the display. 

Still further, if the CPU 102 determines that the 
new CD is not in one of a set of predefined standard 
formats, then it is assumed to be a CD/ROM having 
software for execution by the CPU 102. Such software 

30 may be completely or partially encoded, such as by the 
method of U.S. Patent No. 4,405,829, 4,200,770, 
4,218,582 or 4,995,082, all incorporated herein by 
reference. The CPU 102 determines the authenticity of 
the data on the disk and copies a table of contents into 
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system memory 108. Starting software is then located on 
the disk and retrieved into system memory 108 either by 
direct CPU byte reads from the expansion interface 614, 
or by DMA transfers. The software contains instructions 
5 intended for execution by the CPU 102, and it may need 
to be decompressed either by the decompression 
co-processor 166 (Fig. 1) or by a routine running in the 
audio subsystem 606, before execution begins. With the 
exception of some routines which may already be present 

10 in system memory 108 (having been copied there from the 
CPU ROM 132), this software can contain instructions to 
cause the system of Fig. 1 to perform all of the various 
user-level multimedia operations described herein. 

As previously mentioned, software may be provided 

15 to the system of Fig. 1 instead by download via a cable 
interface to the expansion bus 142. In such a 
situation, all addresses and procedures described above 
with respect to CD/ROM originated software can be 
applied to the cable interface. Still further, as 

20 previously mentioned s the software can be downloaded 
from a cable via the video input device 160, transmitted 
over the S-bus during the horizontal blanking period, 
and stored in system memory 108. After authentication 
procedures are performed thereon, and any required 

25 decoding and/or decompression, it may be executed by CPU 
102. 

The invention has been described with respect to 
particular embodiments thereof, and it will be 
understood that numerous modifications are possible 
within its scope. 
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1 1. A multi-media system for use with a video 

2 display, comprising: 

3 a CPU; 

4 a memory having a parallel port and a serial port, 

5 at least said parallel port being read/write, said 

6 parallel port being coupled to said CPU over a parallel 

7 data bus; 

8 a graphics manipulation processor coupled to said 

9 memory over said parallel data bus; 

10 a video display path coupled between said serial 

11 port of said memory and said display; and 

12 arbitration means for arbitrating control of said 

13 parallel port of said memory from requests issued by 

14 said CPU and by said graphics manipulation processor, 

15 said graphics manipulation processor having higher 

16 priority in said arbitration than said CPU. 

1 2. A multi-media system having a CPU, a memory, a 

2 graphics manipulation processor, an audio manipulation 

3 processor, and a video output path different from said 

4 graphics manipulation processor; and 

5 means for loading software into said memory for 

6 said CPU to execute, said software including 

7 instructions for controlling said graphics manipulation 

8 processor, said audio manipulation processor, and/or 

9 said video output path, 

1 3. A multi-media system for use with a display, 

2 comprising: 

3 a memory containing video pixel color information 

4 for scan lines to be displayed on said display; 

5 a serial data bus coupled to said memory; 
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6 a video display path coupled to receive said pixel 

7 color information in coordination with the timing of 

8 said scan lines, over said serial data bus; and 

9 means for transmitting control information to said 

10 video display path over said serial data bus in 

11 coordination with said timing of said scan lines. 

1 4. A method for generating a video image, 

2 comprising the steps of: 

3 providing a first image description, said firBt 

4 image description including first display position 

5 information and first color information for each pixel 

6 in said first image, said first color information being 

7 in a first compressed format; 

8 providing a second image description, said second 

9 image description including second display position 

10 information and second color information for each pixel 

11 in said second image; 

12 applying said first image description to said 

13 second image description in a predefined transformation 

14 to generate a third image description in a memory, said 

15 third image description including third display position 

16 information and third color information for each pixel 

17 in said third image, said third color information being 

18 in a third compressed format; and 

19 transmitting a dynamically decompressed version of 

20 said third image to a display. 
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